Distributed logic analyzer for use in a hardware logic emulation system

ABSTRACT

A hardware emulation system is disclosed which reduces hardware cost by time-multiplexing multiple design signals onto physical logic chip pins and printed circuit board. The reconfigurable logic system of the present invention comprises a plurality of reprogrammable logic devices, and a plurality of reprogrammable interconnect devices. The logic devices and interconnect devices are interconnected together such that multiple design signals share common I/O pins and circuit board traces. A logic analyzer for a hardware emulation system is also disclosed. The logic circuits necessary for executing logic analyzer functions is programmed into the programmable resources in the logic chips of the emulation system.

SPECIFICATION

[0001] 1. Field of the Invention

[0002] The present invention relates in general to apparatus forverifying electronic circuit designs and more specifically to hardwareemulation systems in which multiple design signals are carried on asingle physical wire between programmable logic chips.

[0003] 2. Background of the Invention

[0004] Hardware emulation systems are devices designed for verifyingelectronic circuit desions prior to fabrication as chips or printedcircuit boards. These systems are typically built from programmablelogic chips (logic chips) and programmable interconnect chips(interconnect chips). The term “chip” as used herein refers tointegrated circuits. Examples of logic chips include reprogrammablelogic circuits such as field-programmable gate arrays (“FPGAs”), whichinclude both off-the-shelf products and custom products. Examples ofinterconnect chips include reprogrammable FPGAs, multiplexer chips,crosspoint switch chips, and the like. Interconnect chips can be eitheroff-the-shelf products or custom designed.

[0005] Prior art emulation systems have generally been designed so thateach signal in an electronic circuit design to be emulated is mapped toone or more physical metal lines (“wires”) within a logic chip. Signalswhich must go between logic chips are mapped to one or more physicalpins on a logic chip and one or more physical traces on printed circuitboards which contain the logic and interconnect chips.

[0006] The one-to-one mapping of design signals to physical pins andtraces in prior art emulation systems leads to the requirement that theemulation system contain at least as many logic chip pins and printedcircuit board traces as there are design signals to be routed betweenlogic chips. Such an arrangement requires the use of very complex andexpensive integrated circuit packages, printed circuit boards andcircuit board connectors to construct the emulation system. The highcost of these components, which in turn increases the cost of thehardware logic emulation system, is a factor in limiting the number ofdesigners who can afford, and therefore, benefit from, the advantagesprovided by hardware emulation systems.

[0007] Furthermore, integrated circuit fabrication technology isallowing the use of ever decreasing feature sizes. Thus, the logicdensity of logic chips (i.e., the number of logic gates that can beimplemented therein) has increased dramatically. The increase in thenumber of logic gates that can be implemented or emulated in a singlelogic chip, however, has not been met with an increase in the number ofpins (i.e., leads) available for inputs, outputs, clocks and the like onthe chip's package. The number of pins on an integrated circuit packageis limited by the available perimeter of the chip. Furthermore, thecapability of the wire-bonding assembly equipment used to connect thebonding pads on integrated circuit dice to the pins on the package hasincreased slowly over time. Thus, there is an increasing mismatchbetween the amount of logic available on a logic chip and the number ofpins available to connect the logic to the outside world. This resultsin poor average utilization of the logical capacity of the logic chips,which increases the cost of a hardware emulation system necessary foremulation of a given sized electronic circuit design.

[0008] Time-multiplexing is a technique that has been used for sharing asingle physical wire or pin between multiple logical signals in certaintypes of systems where the cost of each physical connection is veryhigh. Such systems include telecommunication systems. Time-multiplexing,however, has not been commonly used in hardware emulation systems suchas those available from Quickturn Design Systems, Inc., Mentor GraphicsCorporation, Aptix Corporation, and others because the use of prior arttime-multiplexing methods significantly reduced the speed at which theemulated circuit could operate. Furthermore, prior art time-multiplexingtechniques makes it difficult to preserve the correct asynchronousbehavior of an embedded design in the hardware emulation system.

[0009] As discussed, one function of hardware emulation systems is toverify the functionality of an integrated circuit. Typically, when acircuit designer or engineer designs an integrated circuit, the designis represented in the form of a “netlist” description of the design. Anetlist description (or “netlist”, as it is referred to by those ofordinary skill in the art) is a description of the integrated circuit'scomponents and electrical interconnections between the components. Thecomponents include all those circuit elements necessary for implementinga logic circuit, such as combinational logic (e.g., gates) andsequential logic (e.g., flip-flops and latches). Prior art emulationsystems analyzed the user's circuit netlist prior to implementing thenetlist into the hardware emulation system. This analysis included thesteps of separating the various circuit paths of the design into clockpaths, clock qualifiers and data paths. A method for performing thisanalysis and separation is disclosed in U.S. Pat. No. 5,475,830 by Chenet al, which is assigned to the same assignee as the present invention.The disclosure of U.S. Pat. No. 5,475,830 is incorporated herein byreference in its entirety. The techniques disclosed in U.S. Pat. No.5,475,830 have been used in prior art emulation systems such as theSystem Realizer™ brand hardware emulation system from Quickturn DesignSystems, Inc., Mountain View, Calif. However, the techniques disclosedtherein have not been used in combination with any type oftime-multiplexing.

[0010] Other prior art hardware emulation systems such as thoseavailable from Virtual Machine Works (now IKOS), ARKOS (now Synopsis)and IBM have attempted to use time-multiplexing of design signals onto asingle physical logic chip pin and printed circuit board trace to seeklower hardware cost for a given size of electronic design to beemulated. These prior art emulation systems, however, alter orre-synthesize clock paths in an attempt to maintain correct circuitbehavior. This alteration or re-synthesis process works predictably forsynchronous designs. However, altering or re-synthesizing the clockpaths in an asynchronous design can lead to inaccurate or misleadingemulation results. Since most circuit designs have asynchronous clockarchitectures, the need to alter or re-synthesize the clock paths is alarge disadvantage.

[0011] In addition, prior art hardware emulation machines usingtime-multiplexing have suffered from low operating speed. This is aconsequence of re-synthesizing the clock paths. In these machines, anumber of internal machine cycles are required to emulate one clockcycle of a design. Thus, the effective operating speed for the emulateddesign is typically many times slower than the maximum clock rate of theemulation system itself. If there are multiple asynchronous clocks inthe design to be emulated, the slowdown typically becomes even worsebecause of the need to evaluate the state of the emulated design betweeneach pair of input clock edges.

[0012] Prior art hardware emulation machines using time-multiplexingalso require complex software for synchronizing the flow of many designsignals over a single physical logic chip pin or printed circuit boardtrace. Each design signal must be timed so that it has the correct valueat the instant it is needed in other parts of the system to computeother design signals. This timing analysis software (also known asscheduling software) adds to the complexity of the emulator and to thetime needed to compile a circuit design into the emulator.

[0013] Furthermore, prior art hardware emulation machines which usetime-multiplexing only use a simple form of time-multiplexing whichrequires minimal hardware but uses a large amount of power (e.g.,current) and requires a complex system design.

[0014] Thus, there is a need for a hardware emulation system which hasvery high logical capacity, fast compile times, less complex software,simplified mechanical design and reduced power consumption.

SUMMARY OF THE INVENTION

[0015] A new type of hardware emulation system is disclosed and claimedwhich reduces hardware cost by time-multiplexing multiple design signalsonto physical logic chip pins and printed circuit board traces but whichdoes not have the limitations of low operating speed and poorasynchronous performance. Additional methods to multiplex multiplesignals onto a single physical interconnection which are suitable forhardware emulation but do not have the disadvantages of high power andcomplex system design are also disclosed.

[0016] In the preferred embodiment, time-multiplexing is performed onclock qualifier paths (a clock qualifier is any signal which is used togate a clock signal) and data paths in a design but not on clock paths(a clock path is the path between the clock signal and the clock sourcefrom which the clock signal is derived).

[0017] The reconfigurable logic system of the present inventioncomprises a plurality of reprogrammable logic devices, each havinginternal circuitry which can be reprogrammably configured to provide atleast combinatorial logic elements and storage elements. Theprogrammable logic devices also have programmable input/output terminalswhich can be reprogrammably interconnected to selected ones offunctional elements of the logic devices. The reprogrammable logicdevices have input demultiplexers and output multiplexers implemented ateach input/output terminal. The input demultiplexers receive atime-multiplexed signal and divide it into one or more internal signals.The output multiplexers combine one or more internal signals onto asingle physical interconnection.

[0018] The invention also comprises a plurality of reprogrammableinterconnect devices, each of which have input/output terminals andinternal circuitry which can be reprogrammably configured to provideinterconnections between selected input/output terminals. Thereprogrammable interconnect devices also have input demultiplexers andoutput multiplexers implemented at each input/output terminal. The inputdemultiplexers receive a time-multiplexed input signal and divide itinto one or more component signals. The output multiplexers combine oneor more component signals onto a second single physical interconnection.

[0019] The invention also comprises a set of fixed electrical conductorsconnecting the programmable input/output terminals on the reprogrammablelogic devices to the input/output terminals on the reprogrammableinterconnect devices.

[0020] In another aspect of the present invention, a logic analyzer isintegrated into the logic emulation system which provides completevisibility of the design undergoing emulation. The logic analyzer of thepresent invention is distributed, in that its components are integratedinto many of the resources of the emulation system. The logic analyzerof the present invention comprises having at least scan chainsprogrammed into each of the logic chips of the logic boards. The scanchains are comprised of at least one flip-flop. The scan chains areprogrammably connectable to selected subsets of sequential logicelements of the design undergoing emulation.

[0021] The logic analyzer further comprises at least one memory devicewhich is in communication with the scan chain. This memory device storesdata from the sequential logic elements of the logic design undergoingemulation. Control circuitry communicates with the logic chips of theemulation system and generates logic analyzer clock signals which clockthe scan chains. The control circuitry also generates trigger signalswhen predetermined combinations of signals occur in the logic chips.

[0022] The above and other preferred features of the invention,including various novel details of implementation and combination ofelements will now be more particularly described with reference to theaccompanying drawings and pointed out in the claims. It will beunderstood that the particular methods and circuits embodying theinvention are shown by way of illustration only and not as limitationsof the invention. As will be understood by those skilled in the art, theprinciples and features of this invention may be employed in various andnumerous embodiments without departing from the scope of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] Reference is made to the accompanying drawings in which are shownillustrative embodiments of aspects of the invention, from which novelfeatures and advantages will be apparent.

[0024]FIG. 1 is a block diagram showing a partial crossbar networkincorporating time-multiplexing.

[0025]FIG. 2 is a timing drawing showing the signal relationships fortwo-to-one time-multiplexing.

[0026]FIG. 3 is a block diagram showing the circuitry necessary in anFPGA to do two-to-one time-multiplexing. FIG. 4 is a block diagramshowing the equivalent circuitry in a multiplexing chip.

[0027]FIG. 5 is a timing diagram showing the signal relationshipsnecessary for four-to-one time-multiplexing.

[0028]FIG. 6 is a block diagram showing the logic necessary in an FPGAto do four-to-one time-multiplexing.

[0029]FIG. 7 is a block diagram showing the equivalent circuitry in amultiplexing chip.

[0030]FIG. 8 is a timing diagram showing the signal relationships for apulse width encoding scheme suitable for a hardware emulation system.

[0031]FIG. 9 is a timing diagram showing the signal relationships for aphase encoding scheme suitable for a hardware emulation system.

[0032]FIG. 10 is a timing diagram showing the signal relationships for aserial data encoding scheme suitable for a hardware emulation system.

[0033]FIG. 11 is a block diagram of a logic board of a preferredembodiment of the present invention.

[0034]FIG. 12 is a block diagram of the interconnection among thevarious circuit boards of a preferred embodiment of the presentinvention.

[0035]FIG. 13 is a diagram showing the physical construction of apreferred embodiment of the present invention.

[0036]FIG. 14 is a block diagram of the interconnection among thevarious circuit boards of a version of the presently preferred emulationsystem that has less logical capacity.

[0037]FIG. 15 is a diagram showing the physical construction of theemulation system of FIG. 14 that has one logic board and one I/O board.

[0038]FIG. 16 is a block diagram of an I/O board and core board.

[0039]FIG. 17 is a block diagram of a mux board.

[0040]FIG. 18 is a block diagram of an expandable mux board.

[0041]FIG. 19 is a block diagram showing how the user clocks aredistributed in a preferred hardware emulation system of the presentinvention.

[0042]FIG. 20 is a block diagram showing the control structure of apreferred hardware emulation system of the present invention.

[0043]FIG. 20a is a block diagram of the logic analyzer of a preferredembodiment of the present invention.

[0044]FIG. 20b is a block diagram showing the data path for logicanalyzer signals of a preferred embodiment of the present invention.

[0045]FIG. 20c is a block diagram showing how logic analyzer events aredistributed in the logic chips in a preferred embodiment of the presentinvention.

[0046]FIG. 20d is a logic diagram showing how probed signals arecomputed from storage elements and external input values.

[0047]FIG. 21 is a flow chart showing how to program a preferredembodiment of the hardware emulation system of the present invention.

[0048]FIG. 22 is a flow diagram showing the sequence of steps necessaryfor the compilation of a software-hardware model created by a behavioraltestbench compiler according to a preferred embodiment of the presentinvention.

[0049]FIG. 22a is a block diagram showing an example of a memory circuitthat could be generated by the LCM memory generator of a preferredembodiment of the present invention.

[0050]FIG. 23 is a block diagram of a netlist structure created torepresent special connections of the co-simulation logic to amicroprocessor event synchronization bus according to a preferredembodiment of the present invention.

[0051]FIG. 24a is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0052]FIG. 24b is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0053]FIG. 24c is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0054]FIG. 24d is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0055]FIG. 24e is a schematic diagram of a time-division-multiplexing,cell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0056]FIG. 24f is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0057]FIG. 24g is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0058]FIG. 24h is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0059]FIG. 24i is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0060]FIG. 24j is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0061]FIG. 24k is a schematic diagram of a time-division-multiplexingcell which may be inserted depending on the type of the I/O pins of alogic chip in a preferred embodiment of the present invention.

[0062]FIG. 25 is a block diagram of an event detection cell of apreferred embodiment of the present invention.

[0063]FIG. 26 is a schematic diagram showing how the outputs of ANDtrees are time-multiplexed pairwise using special event-multiplexingcells in accordance with an embodiment of the present invention.

[0064]FIG. 27 is a block diagram of an event detector download circuitof a preferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE DRAWINGS

[0065] Turning to the figures, the presently preferred apparatus andmethods of the present invention will now be described.

[0066]FIG. 1 shows a portion of the partial crossbar interconnect for apreferred embodiment of a hardware emulation system of the presentinvention. Embodiments of a partial crossbar interconnect architecturehave been described in the U.S. Pat. Nos. 5,036,473, ,448,496 and5,452,231 by Butts et al, which are assigned to the same assignee as thepresent invention. The disclosure of U.S. Pat. Nos. 5,036,473, 5,448,496and 5,452,231 are incorporated herein by reference in their entirety. Ina partial crossbar interconnect, the input/output pins of each logicchip are divided into proper subsets, using the same division on eachlogic chip. The pins of each Mux chip (also known as crossbar chips) areconnected to the same subset of pins from each logic chip. Thus,crossbar chip ‘n’ is connected to subset ‘n’ of each logic chip's pins.As many crossbar chips are used as there are subsets, and each crossbarchip has as many pins as the number of pins in the subset times thenumber of logic chips. Each logic chip/crossbar chip pair isinterconnected by as many wires, called paths, as there are pins in eachsubset.

[0067] The partial crossbar interconnect of FIG. 1 comprises a number ofreprogrammable interconnect blocks 12, which in a preferred embodimentare multiplexer chips (Mux chips). The partial crossbar interconnect ofFIG. 1 further comprises a number of reprogrammable configurable logicchips 10, which in a presently preferred embodiment arefield-programmable gate arrays (FPGAs). Each Mux chip 12 has one or moreconnections to each logic chip 10. In the preferred embodiment describedin Butts, each design signal going from a logic chip to a lux chip takesone physical interconnection. In other words, one signal on a pin from alogic chip 10 is interconnected to one pin on a Mux chip 12. In theembodiments of the present invention, each physical interconnection inthe partial crossbar architecture may represent one or more designsignals.

[0068] Each Mux chip 12 comprises a crossbar 22, together with a numberof input demultiplexers 24 and output multiplexers 26. The inputdemultiplexers 24 take a time-multiplexed input signal and divide itinto one or more component signals. The component signals are routedseparately through the crossbar 22. They are then multiplexed again, inthe same or a different combination, by an output multiplexer circuit26. In a preferred embodiment, time-multiplexed signals are not routedthrough the Mux chip crossbar 22. By not routing time-multiplexedsignals through the Mux chip crossbar 22, the flexibility of routing thepartial crossbar network is increased because input signals and outputsignals to the Mux chips may be combined in different combinations. Thisalso reduces the power consumption of the Mux chip 12 since thetime-multiplexing frequency is typically much higher than the averageswitching rate of the component signals.

[0069] The logic chips 10 also comprise a plurality of inputdemultiplexers 34 and output multiplexers 36. The output multiplexers 36take one or more internal logic chip 10 signals and combine them onto asingle physical interconnection. The input demultiplexers 34 take atime-multiplexed signal and divide it into one or more internal logicchip 10 signals. In the presently preferred embodiment, thesemultiplexers 36 and demultiplexers 34 are constructed using the internalconfigurable logic blocks of a commercially available off-the-shelfFPGA. However. they could be constructed using input/output blocks of areprogrammable logic chip custom designed for emulation.

[0070]FIG. 1, for illustrative purposes only, shows two Mux chips 12with four pins each and four logic chips 10 with two pins each. Theactual embodiments of preferred hardware emulation systems would havemore of each type of chip and each chip would have many more pins. Theactual number of Mux chips 12, logic chips 10, and number of pins oneach is purely a matter of design choice, and is dependent on thedesired gate capacity to be achieved. In a presently preferredembodiment, each printed circuit board contains fifty-four Mux chips 12and thirty-seven logic chips 10. The presently preferred logic chips 10are the 4036XL FPGA (also known as a “logic cell array”), which ismanufactured by Xilinx Corporation. It should be noted, however, thatother reprogrammable logic chips such as those available from AlteraCorporation, Lucent Technologies, or Actel Corporation could be used. Inthe presently preferred embodiment, thirty-six of the logic chips 10make five connections to each of the fifty-four Mux chips 12. What thismeans is that five of the pins of each of these thirty-six logic chips10 has a physical electrical connection to five pins of each of thefifty-four Mux chips 12. The thirty-seventh logic chip 10 makes threeconnections to each of the fifty-four Mux chips. What this means is thatthree of the pins of this thirty-seventh logic chip 10 have a physicalelectrical connection to three pins of each of the fifty-four Mux chips12.

[0071]FIG. 2 shows a example of a timing diagram for a two-to-onetime-multiplexing emulation system in which internal logic chip Signal A40 and internal logic chip Signal B 42 are multiplexed onto a singleoutput signal 46. A Mux Clock Signal 44 is divided by two to produceDivided Clock Signal 50. A SYNC- Signal 48 is used to synchronize theMux Clock divider 68 (see FIG. 3) so that the falling edge of Mux Clock44 sets Divided Clock Signal 50 to zero if SYNC-Signal 48 is low.Divided Clock 50 is used to sample internal signal A 40 on each risingedge. This sample is placed into a storage element such as a flip-flopor latch (shown in FIG. 3). The same Divided Clock Signal 50 is used tosample internal Signal B 42 on each falling edge. This sample is placedinto another flip-flop or latch (shown in FIG. 3). In a preferredembodiment, the Mux Clock Signal 44 may be asynchronous to Signal A 40and Signal B 42. When the value on the Divided Clock Signal 50 is high,the previously sampled Signal A 40 is then transferred to the OutputSignal 46. When the Divided Clock Signal 50 is low, the previouslysampled Signal B 42 is transferred to the Output Signal 46.

[0072] Referring now to FIG. 3, the logic implemented in FPGA 10 of apresently preferred embodiment which creates the timing of the signalsshown in FIG. 2 is shown in detail. A two-to-one clock divider 68divides the Mux Clock Signal 44 to produce Divided Clock Signal 50.Clock divider 68 comprises flip-flop 68 a, AND gate 68 b and inverter 68c. Divided Clock Signal 0 is input to an output multiplexer 66 (seemultiplexer 36 of FIG. 1) and the input demultiplexer 64 (seedemultiplexer 34 of FIG. 1). The clock divider 68 is reset periodicallyby the SYNC-Signal 48 to ensure that all the clock dividers in thesystem are synchronized. The input demultiplexer 64 is composed of twoflip-flops 65 a and 65 b. Flip-flops 65 a and 65 b are clocked by MuxClock 44. One flip-flop (e.g., 65 b) is enabled when divided clocksignal 50 is high and the other (e.g., 65 a) is enabled when DividedClock Signal 50 is low. Divided clock Signal 50 is not used directly asa clock to the flip-flops 65 a, 65 b in input demultiplexer 64 and inthe output multiplexer 66 to conserve low-skew lines in the FPGA 10. Theoutput of either flip-flop 65 a or 65 b provides a static demultiplexeddesign signal to the core 62 of the FPGA 10 (the core 62 of the FPGA 10comprises the configurable elements used to implement the logicfunctions of the user's design). The output multiplexer 66 comprises twoflip-flops 67 a, 67 b which are clocked by Mux Clock 44. Outputmultiplexer 66 also comprises two-to-one multiplexer 67 c. One flip-flop(e.g., flip-flop 67 b) is enabled when Divided Clock Signal 50 is highand the other flip-flop (e.g., flip-flop 67 a) is enabled when DividedClock Signal 50 is low. The two-to-one multiplexer 67 c selects theoutput Q of either flip-flop 67 a or flip-flop 67 b to appear on theoutput pin.

[0073] Corresponding circuitry for the Mux chip 12 is shown in FIG. 4.Unlike the circuitry in the logic chip 10, the output multiplexer 76(see multiplexer 26 of FIG. 1) in the Mux chip 12 is constructed withoutflip-flops and therefore comprises two-to-one multiplexer 76 a. This ispossible because in the preferred embodiment, delays through the Muxchip 12 are short. To save additional logic, flip-flops 74 a, 74 b inthe input demultiplexer 74 (see demultiplexer 24 of FIG. 1) do not haveenable inputs. Instead, the Divided Clock Signal 50 is used to clock theflip-flops 74 a, 74 b directly. Clock divider 78 is preferably comprisedof flip-flop 78 a, AND gate 78 b. and inverter 78 c. The clock divider78, the input demultiplexer 74, and the output multiplexer 76 operatesimilarly to the corresponding elements in FIG. 3.

[0074] Since it is not known in advance whether an input/output (“I/O”)pin on a Mux chip 12 will be an input or an output for a given design,all I/O pins in the Mux chips 12 include both an input demultiplexer 74and an output multiplexer 76.

[0075] Using the concepts of the present invention, it is possible to dofour-to-one time-multiplexing where a pin is an input for a time then anoutput for a time. FIG. 5 shows a timing diagram for four-to-onetime-multiplexing. Just as for two-to-one time-multiplexing, there is aMux Clock signal 44 and a SYNC- Signal 48. The Mux Clock Signal 44 isdivided by two to produce a Divided Clock Signal 50. The divider issynchronously reset when the SYNC- Signal is low and a falling edgeoccurs on the Mux Clock Signal 44. In addition, there is an additionalDirection Signal 80 which is produced by dividing the Divided ClockSignal 50 again by two. The Direction Signal 80 controls whether the pinis an input or an output at each instant in time. Four enable signals E090, E1 92, E2 94 and E3 96 are used to enable individual flip-flops inthe logic chips 10 as will be described later. These four signals arederived from the Divided Clock Signal 50 and the Direction Signal 80.

[0076] The Divided Clock Signal 50 samples the External Signal 98 toproduce Internal Input Signal E 86 and Internal Input Signal F 88 whentheDirection Signal 80 is low. When Direction Signal 80 is low, itsignifies that the pin is operating in an input direction. InternalInput Signal E 86 is produced by sampling on the rising edge of DividedClock Signal 50. Internal Input Signal F 88 is produced by sampling onthe falling edge of Divided Clock Signal 50. When Direction Signal 80 ishigh, the pin receiving it operates as an output. Internal Output SignalC 82 is output onto External Signal 98 when Divided Clock signal 50 islow and Internal Output Signal D (84) is output when Divided ClockSignal 50 is high.

[0077] Referring now to FIG. 6, the logic implemented in logic chip 10to create the timing signals of FIG. 5 is shown in detail. A clockdivider 104 divides the Mux Clock signal 44 to produce Divided ClockSignal 50 and Direction Signal 80. Clock divider 104 is comprised offlip-flops 104 a and 104 b, AND gates 104 c and 104 d, inverter 104 e,EXCLUSIVE-OR gate 104 f. and AND gates 104 g-104 j. The clock divider104 is reset periodically by the SYNC Signal 48 to insure that all theclock dividers 104 in the system are synchronized. In addition, theclock divider circuit 104 also produces Enable Signals E0 90, E1 92, E294 and E3 96. These signals are used as enables in the input/outputmultiplexer circuits 100 and 102.

[0078] The input/output multiplexer circuit 100 has timing correspondingto the diagram of FIG. 5. The External Signal 98 is an input whenDirection Signal 80 is low. Signal E 86 and Signal F 88 are sampled fromExternal Signal 98 when Enable Signal E0 90 and Enable Signal E1 92 areactive and placed in flip-flops 100 a and 100 b respectively. Signals D84 and C 82 are saved in flip-flops 100 c and 100 d when enable signalsE2 94 and E1 92 are active. A preferred input/output multiplexer circuit100 is also comprised of multiplexer 100 e and buffer 100 f.

[0079] These cause signals D 84 and C 82, previously saved in flip-flops100 c and 100 d, to appear successively on External Signal 98 whenDirection Signal 80 is high.

[0080] The input/output multiplexer circuit 102 is similar except thatthe timing has been altered so that Signal 106 is an output whenDirection Signal 80 is low. Input/output multiplexer 102 is preferablycomprised of flip-flops 102 a-102 d, multiplexer 102 e and buffer 102 f.The input/output multiplexer 100 is referred to herein as an “inout”multiplexer while the multiplexer 2 is referred to herein as an “outin”multiplexer. When pins are connected together in a system, an inout pinmust always be connected to an outin pin so that one pin is drivingwhile the other is listening (i.e., ready to receive or receiving asignal).

[0081] Corresponding 4-way time-multiplexing circuitry is shown in FIG.7 for mux chip 12. Clock divider 132 produces Divided Clock Signal 50and Direction signal 80. Clock divider 132 is comprised of flip-flops132 a, 132 b, AND gates 132 c, 132 d, inverter 132 e and EXCLUSIVE-ORgate 132 f. As in the logic chip 10, there is an inout multiplexer 120and an outin multiplexer 122. Inout multiplexer 120 is preferablycomprised of flip-flops 120 a, 120 b, two-to-one multiplexer 120 c andbuffer 120 d. Inout multiplexer 120 has the timing shown in FIG. 5.External Signal 98 is an input when Direction Signal 80 is low and anoutput when Direction Signal 80 is high. Internal signals 124 and 126are sampled from External Signal 98 when Direction signal 80 is low.Internal signals 128 and 130 are output onto External Signal 98 whenDirection signal 80 is high.

[0082] Outin multiplexer circuit 122 is similar except that the timinghas been altered so that signal 134 is an input when Direction Signal 80is high and an output when Direction Signal 80 is low. Outin multiplexer122 is preferably comprised of flip-flops 122 a, 122 b, two-to-onemultiplexer 122 c and buffer 122 d. An outin pin on a mux chip 12 mustconnect to an inout pin on another mux chip 12 or a logic chip 10.Additional configuration bits (not shown in FIG. 7) make it possible toprogrammably configure any pin of the mux chip 12 to be eithernon-multiplexed, two-to-one multiplexed as either an input or an output,or four-to-one multiplexed as either an inout or an outin pin. This isdone by selectively forcing direction signal 80 to always be low (for atwo-to-one input), always be high (for a two-to-one output), benon-inverted (for an inout four-to-one pin as in 120), or be inverted(for an outin four-to-one pin as in 122). Additionally, external signal98 can be directly connected to core signal 124 for a non-multiplexedinput. Core outputs 128 and 130 can be directly connected to the inputand enable pins of buffer 120 d for a non-multiplexed output.

[0083] Although the preferred embodiment incorporates two-to-one andfour-to-one time-multiplexing, the technique disclosed could be extendedto allow multiplexing by any other factor that the designer mightchoose. In general, higher multiplexing factors result in sloweremulation speed but allow simpler and lower-cost hardware because thephysical wires and pins can be shared among more logical design signals.

[0084] Furthermore, there are many other methods for multiplexingmultiple bits of information onto a single physical wire which could beused in an emulation system. Examples of these techniques arepulse-width modulation, phase modulation and serial data encoding. Thechoice of which technique to use in a particular embodiment is a matterof the designer's choice and depends on the tradeoffs between operatingspeed, cost, power consumption and complexity of the logic required.

[0085] One aspect of these more complex encoding schemes which isimportant in a hardware emulation system is the ability to reduce powerconsumption. A hardware emulation system typically will have manythousands of interconnect paths. To minimize delay through the system,it is desirable to switch these interconnect paths as rapidly aspossible between different logical design signals. Power consumption ofthe system, however, is largely determined by the speed at which theinterconnect paths are switched. In a large system, generating anddistributing power and removing the resulting heat can significantlyincrease the complexity and cost of the system. It is thereforedesirable to have a multiplexing scheme which operates quickly but doesnot require large amounts of power. One way in which power dissipationcould be minimized is by only transferring design signal informationwhen design data changes rather than transferring design signalinformation continuously, as is done in the presently preferredembodiment.

[0086] Another important aspect to consider when choosing an encodingscheme is the ability to have interconnections which operateasynchronously to each other or asynchronously to a master multiplexingclock. In the simple form of time-multiplexing described above for thepresently preferred embodiment, a master multiplexing clock must bedistributed with low skew to all logic chips 10 and Mux chips 12 in thesystem. In addition, the master multiplexing clock must be run slowenough so signals have time to pass over the longest interconnect pathin the system. At the same time, there must be no hold-time violationsfor the shortest interconnect path in the system. A hold time violationcould occur if a transmitting device removed a data signal before areceiving device had properly saved it into a flip-flop or latch. Therequirement for a low-skew master clock significantly increases thecomplexity and cost of the emulation system. In addition, therequirement to not have hold time violations on the shortest possibledata path while insuring sufficient time for signals to pass over thelongest possible data path means that the multiplexing clock mustoperate relatively slowly. As explained earlier, this is undesirablebecause it limits the effective operating speed of the emulation system.

[0087] The inventive concepts described above with respect to thesimplest form of time-multiplexing are equally applicable to morecomplex encoding schemes, which will now be seen. Encoding schemes usingpulse-width modulation, phase-shift modulation and serial encoding canreduce power consumption and and increase the relatively low operatingspeed intrinsic to the simplest form of time-multiplexing. Thedisadvantage of all of these schemes (relative to simpletime-multiplexing) is that they require significantly more encoding anddecoding logic and, for that reason, simple time-multiplexing was usedin the presently preferred embodiment. As the cost of digital logicdecreases relative to the cost of physical pins and circuit boardtraces, one or more of these more complex encoding schemes will likelybe used in the future.

[0088] Referring to FIG. 8, a form of pulse-width modulation is shownwhich would be suitable for a hardware emulation system. The ExternalSignal 146 is normally low. When a transition occurs on a Design Signal140 or 142, a pulse is emitted on the External Signal 146. A High SpeedAsynchronous Clock Signal 144 is distributed to all chips in the system.Unlike the Mux Clock 44 described earlier with reference to FIG. 2,Asynchronous Clock Signal 144 need not be synchronized between any twochips in the system or even between two pins on the same chip.

[0089] Therefore, there is no need for a SYNC- Signal 48 as describedearlier with reference to FIG. 2. Also, Asynchronous Clock Signal 144may operate at any speed as long as the minimum pulse width produced onExternal Signal 144 will pass through the interconnect without unduedegradation. The pulse emitted on External Signal 146 may have a widthof one, two, three or four clocks depending on whether the two DesignSignals 140 and 142 had values of 00, 01, 10 or 11 when a signaltransition occurred. Asynchronous Clock Signal 144 must, however, besufficiently fast that five clock cycles always elapse betweensuccessive edges of Design Signals 140 and 142 to ensure thatinformation is not lost. Data Signals 140 and 142 are recovered fromExternal Signal 146 by counting the number of Asynchronous Clock 144cycles that occur each time External Signal 146 goes high. In an actualembodiment, Asynchronous Clock 144 would operate at twice or three timesthe speed shown to ensure that recovered signals could be unambiguouslydistinguished. Additional circuitry would also be added to periodicallytransfer data, even in the absence of design signal transitions in orderfor the design to initialize properly.

[0090] Since Design Signals 140 and 142 transition relativelyinfrequently on average compared to Asynchronous Clock 144, powerconsumption will be low compared to the continuous time-multiplexingscheme described earlier. In addition, this encoding scheme is notaffected by varying amounts of delay between the transmitting circuitand the receiving circuit.

[0091] The logic circuitry necessary to implement this pulse-widthencoding scheme could be designed by one skilled in the art of circuitdesign and thus will not be further discussed here. It is noted,however, that one skilled in the art could design logic circuits havingmany different variations while still achieving the same function. Forexample, three design signals could be encoded onto one external signal146 instead of two. Also different encodings of Design Signals 140 and142 could be used or the default value of External Signal 146 could beone instead of zero.

[0092] The pulse width modulation encoding scheme described withreference to FIG. 8 suffers from the following limitations. In a pulsewidth modulation encoding scheme, the pulse width must be measured froma rising edge on External Signal 146 to a falling edge on ExternalSignal 146. However, when a signal passes through many levels of routingchips, a rising edge will often be delayed by a different amount than afalling edge. The speed of the signal multiplexing may, therefore, needto be slowed down to ensure that signal values can still bedistinguished after passing through many levels of routing chips. Also,the modulation scheme of FIG. 8 is sensitive to unavoidable momentarysignal transitions or glitches on External Signal 146 which may causefalse signal values to be transmitted.

[0093] Referring to FIG. 9, a form of phase modulation is shown whichwould be suitable for a hardware emulation system. An internalphase-locked loop (“PLL”) circuit continuously counts from zero to three(shown as PLL Count 150 in FIG. 9) using Asynchronous Clock 144 as aninput. The PLL circuit may be of a type commonly known as a digitalphase-locked loop (“DPLL”) which is relatively easy to construct withcomplementary metal oxide semiconductor (“CMOS”) integrated circuittechnology. When a transition occurs on Design Signals 140 or 142,External Signal 152 makes a transition at a time which depends on thevalue of Design Signals 140 and 142. For example, after the firsttransition on Signal A 140, both Signal A 140 and Signal B 142 will behigh. External Signal 152, therefore makes a transition when the PLL isat count 3 (A, B=11). Later, after a transition on Signal B 142, SignalA 140 will be high and signal B 142 will be low. External signal 152therefore makes a transition when the PLL is at count 2 (A, B=10).

[0094] The receiving circuit has a matching PLL which is keptsynchronized to the transmitting PLL by sync pulses which are sentperiodically when no data needs to be transferred. A sync pulse consistsof two transitions occurring at time zero and time two of thetransmitting PLL. A sync pulse may be recognized by the receiving PLLbecause it is the only time when two transitions occur on externalsignal 152 within one PLL cycle. The sync pulse causes the receiving PLLto adjust its count gradually so that it becomes synchronized with thetransmitting PLL after several sync pulses have occurred. The sync pulseneed only occur relatively infrequently compared to transitions onSignals A 140 and Signal B 142 so power consumption is not greatlyincreased. In an actual embodiment, Asynchronous Clock 144 would operateat a relative speed two or three times what is shown in FIG. 9 to havesufficient resolution to clearly distinguish between the different edgetransition times on External Signal 152. Alternatively, the phase-lockedloop could be run at a multiple of the frequency of Asynchronous Clock144 to increase resolution. Also, circuitry would be included toperiodically transmit the value of Design Signal A 140 and Design SignalB 142 even if no transition had occurred, so that the design wouldinitialize properly.

[0095] The circuitry necessary to implement the digital phase-lock loopsand the transmit and receiving circuitry used in this phase encodingscheme could be designed by one skilled in the art of circuit design andthus will not be further discussed here.

[0096] The phase modulation encoding scheme discussed above has severaladvantages over the pulse-width modulation scheme discussed earlier (seeFIG. 8). Fewer transitions on External Signal 152 are required totransmit values of Signal A 140 and Signal B 142 than is the case forExternal Signal 146 in FIG. 8. This reduces the power consumed by thesystem. Also, the circuit can be made less sensitive to noise becauseglitches or short pulses are treated as sync pulses and have only agradual effect on PLL Count 150. In addition, separate PLL counters canbe used to time rising edges and falling edges since the sync pulsealways includes one rising and one falling edge. By timing the risingand falling edges separately, Asynchronous Clock 144 can be run at avery high frequency and External Signal 152 can be passed through manyintermediate routing chips without affecting the ability to reliablyrecover Signal A 140 and Signal B 142.

[0097] The main disadvantage of phase modulation, however, is that itrequires a relatively large amount of digital logic to implement.

[0098] Many variations of the phase modulation encoding scheme disclosedherein are possible without deviating from the teachings of theinvention. For example, the PLL could recognize eight or sixteentransition times rather than just four. Also, additional design signalscould be transmitted by creating more than one edge on External Signal152 each time a transition occurred on a design signal. For example,Design Signals A and B could be transmitted on a first edge of ExternalSignal 152 and Design Signals C and D could be transmitted on a secondedge of External Signal 152. This has the effect of transmitting moredata on External Signal 152 but at a lower speed.

[0099] Referring to FIG. 10, another form of modulation is shown whichwould also be useful in a hardware emulation system. This technique isknown as serial data encoding. Many common protocols such as RS232 use avariation of serial data encoding. When Design Signal A 140 or DesignSignal B 142 make a transition, a serial string of data is transmittedon External Signal 162. A start bit which is always zero signifies thata transmission is about to occur. Next, the values of Signal A 140 andSignal B 142 are transmitted successively. Finally, a stop bit which isalways a one is transmitted. The receiving circuitry uses AsynchronousClock Signal 144 to delay one and one-half clocks from the falling edgeof the start bit before sampling External Signal 162 to recover Signal A140. It then delays an additional clock before sampling External Signal162 again to recover Signal B 142. In an actual embodiment, AsynchronousClock 144 would operate at a relative frequency several times higherthan that shown in FIG. 10 to sample External Signal 162 accurately atthe center point when Signal A 140 and Signal B 142 are beingtransmitted.

[0100] The circuitry necessary to implement the serial data encodingscheme could be designed by one skilled in the art of circuit design andthus will not be further discussed here.

[0101] Serial data encoding has the advantage that relatively simpledigital logic may be used. It has the disadvantage, however, thatseveral edges on external signal 162 are required to transmit eachchange to Design Signal A 140 and Design Signal B 142. This means thatthe data rate is relatively low and the power consumption relativelyhigh compared to other techniques.

[0102] Many variations of the serial data encoding scheme disclosedherein are possible without deviating from the teachings of theinvention. For example, values for more than two design signals could betransmitted each time a design signal makes a transition.

[0103] Any of the encoding techniques shown in FIGS. 8-10 could befurther improved by the addition of some form of error checkingtechnique. Since design data is only transmitted when a design signalchanges, transmission errors will result in wrong data values beinglatched by the receiving circuits and the probability of incorrectoperation of the emulation system. Common error detection and correctiontechniques such as parity or cyclic redundancy checking (CRC) could beused.

[0104] The system aspects of a preferred embodiment will now bedisclosed in more detail. Referring to FIG. 11, a block diagram of thelogic board 200 of a preferred embodiment is shown incorporating logicchips (which in the presently preferred embodiment are FPGAs) and Muxchips 12. The logic board 200 has a partial crossbar interconnectionsimilar to that disclosed in Butts et al. The main difference is thatthe partial crossbar of the presently preferred embodiment of thepresent invention is not completely uniform because logic chip 204,which will be discussed below, has fewer connections with Mux chips 12than do the other logic chips. In the presently preferred embodiment,there are fifty-four Mux chips 12 with two-hundred and sixty I/O pins oneach and thirty-six logic chips (FPGAs) 10 with two-hundred and seventyI/O pins each. The presently preferred embodiment utilizes FPGAs aslogic chips 10 with the part number XC4036XL manufactured XilinxCorporation, San Jose, Calif., U.S.A. Each of the thirty-six logic chips10 has five connections to each of the fifty-four Mux chips 12. Athirty-seventh logic chip 204, known herein as the co-simulation (CoSim)logic chip has three connections to each of fifty-four mux chips 12. Ina presently preferred embodiment, this thirty-seventh logic chip 204 isalso an FPGA manufactured by Xilinx having part number 4036XL.Additional pins (not shown) on Mux chips 12 and logic chips 10 and 204are reserved for downloading, clock distribution, and other systemfunctions. The purpose of CoSim logic chip 204 will be discussed below.Any of the Mux chip 12 to logic chip 10 connections may benon-multiplexed, multiplexed two-to-one, or multiplexed four-to-one byprogramming the mux chips 12 and logic chips 10 appropriately.

[0105] In addition to the interconnections discussed above, CoSim logicchip 204 is also in electrical communication with a processor 206. Inthe presently preferred embodiment, the processor 206 is a PowerPC 403GCchip available from IBM corporation. Processor 206 is used forco-simulation, which is described in copending application Ser. No.08/733,352, entitled Method And Apparatus For Design Verification UsingEmulation And Simulation, to Sample et al. The teachings of applicationSer. No. 08/733,352 are incorporated herein by reference. Processor 206is also used for diagnostic functions and downloading information to Muxchips 12, logic chips 10, 204 and RAM 208 (discussed below) and SGRAM210 (discussed below). The processor 206 is connected through a VMEinterface (not shown) to backplane connector 220. Twelve of the logicchips 10 also have connections to a 32K by 32 static random accessmemory (RAM) chip 208. This RAM chip 208 is used for implementing largememories which may be part of an emulated circuit. The RAM 208 isattached to some of the lines also connecting the logic chips 10 to themux chips 12. In this way, if the RAM is not used, the logic chip 10 tomux chip 12 connections can be used for ordinary interconnect functionsand are not lost. If the RAM 208 is needed for implementing memory thatis part of a particular netlist, the logic chip 10 that communicateswith it has a RAM controller function programmed into it.

[0106] The mux chips 12 also have connections to a backplane connector220 and a turbo connector 202. Backplane and turbo connections may alsobe either non-multiplexed, multiplexed two-to-one, or multiplexedfour-to-one. The turbo connector 202 is used to electrically connect twologic boards 200 together in a sandwich. By providing direct connectionsbetween two logic boards in a pair, the number of backplane connectionsrequired for a particular design may be reduced. The backplane connectormust fit along one edge of the logic board and the number of possiblebackplane connections is limited by the types of connectors available.If there are insufficient backplane connections, the partitioningsoftware will not be able to operate efficiently, thereby reducing thelogic capacity of the board. Two emulation boards connected in asandwich are shown in FIG. 13. If a smaller emulation system comprisingless than two emulation boards is desired, a special turbo loopbackboard having no logic disposed thereon is used. In such a system, thespecial turbo loopback board simply routes signals from turbo connector202 to backplane connector 220. An example of a configuration using aturbo loopback board is shown in FIG. 15.

[0107] In addition, the Mux chips 12 have eight connections each to aset of synchronous graphics RAMs (SGRAMs) 210. These SGRAMs 210 are usedto form the data path of a distributed logic analyzer. Design signalsmay be sampled in the logic chips 10 and CoSim logic chip 204 and routedthrough Mux chips 12 then saved in SGRAMs 210 for future analysis by theuser. The logic analyzer is disclosed further below.

[0108] Logic chips 10 and CoSim logic chip 204 are also attached to anevent bus 212 and a clock in bus 214. The event bus is used to routeevent signals from within logic chips 10 and CoSim logic chip 204 tologic analyzer control circuitry (shown in FIG. 20a, discussed below).The event bus consists of four signals and is time-multiplexedtwo-to-one to provide eight event signals. The signals on the event bus212 are buffered and then routed to additional pins (not shown) onbackplane connector 220.

[0109] The clock in bus 214 consists of eight low-skew special purposeclock nets Which are routed to all logic chips 10 and CoSim logic chip204 (discussed below). The clock in bus 214 is used to distribute clocksignals as is explained in U.S. Pat. No. 5,475,830. Clocks from clock inbus 214 may come directly through buffer 216 from signals 218 which areconnected to additional pins (not shown) on backplane connector 220 orthey may be created by combining primary clock signals 218 with logic inCoSim logic chip 204. When CoSim logic chip 204 is used for implementingclock logic, it is acting as a “clock generation FPGA” as explained inU.S. Pat. No. 5,475,830.

[0110] Referring now to FIG. 12, the interconnection among boards isshown. Logic boards 200 are assembled into pairs which are connectedthrough turbo connectors 202. The logic boards are also connectedthrough the backplane connectors 220 (shown in FIG. 11) to a switchingbackplane 420. The switching backplane 420 is comprised of mux boards400 which are disposed at right angles to the logic boards. Anarrangement of logic boards and switching boards can be seen in U.S.Pat. No. 5,352,123 to Sample et al and assigned to the same assignee asthe present application. U.S. Pat. No. 5,352,123 is incorporated hereinby reference in its entirety. The switching backplane 420 also connectsto I/O boards 300 (only one I/O board 300 is shown in FIG. 12. However,the use of more than one I/O board 300 is contemplated as part of thepresent invention). I/O boards 300 serve the functions of routing andbuffering signals from external devices contained on core board 500 orexternal system 540. They also have the ability to provide stimulussignals to all external pins so that the emulated design can be operatedin the absence of an external device or system.

[0111] I/O boards 300 connect through core board 500 and repeater pod520 to an external system 540. To simplify FIG. 12, the actual numbersof boards and connections have been reduced. In a presently preferredembodiment, there are twenty-two mux boards 400, one to ten pairs oflogic boards 200 and up to eight I/O boards 300. In the presentlypreferred embodiment, if more than two I/O boards 300 are used, a pairof logic boards 200 are lost for each additional pair of I/O boards 300.In the presently preferred embodiment, each I/O board 300 has oneassociated core board 500 which has up to seven repeater pods 520 whichare attached to cables. Each repeater pod in the presently preferredembodiment buffers eighty-eight bidirectional signals.

[0112]FIG. 13 shows the physical construction of the preferredembodiment system. Mux boards 400 are disposed at right angles to logicboards 200 and I/O boards 300. Backplane 800 has connectors on one sidefor mux boards 400 and on the other side for logic boards 200 or I/Oboards 300. To simplify the drawing, only one mux board 400 and threepairs of logic boards 200 are shown. However, in a presently preferredembodiment, there are, in fact, twenty-two mux boards 400 and up toeleven pairs of logic boards 200 or I/O boards 300. I/O boards 300attach to core boards 500 through connector 330. Core boards 500 have anexternal connector which attaches through a cable to repeater pod 520and external system 540 (not shown in FIG. 13). A power board 240converts from a forty-eight volt DC main power supply to the 3.3 voltsnecessary to power the logic board. This type of distributed powerconversion is made necessary by the time-multiplexing circuitry's highdemand for power. In addition, the system contains a control board 600and a CPU board 700 (see FIG. 20). In a presently preferred embodiment,the CPU board 700 is a VME bus Power-PC processor board available fromThemis Computer and others. Other similar processor boards would besuitable. The selection of the particular processor board to use dependson tradeoffs between cost, speed, RAM capacity and other factors. TheCPU board 700 provides a network interface and overall control of theemulation system. The control board 600 provides clock distribution,downloading and testing functions for the other boards as well ascentralized functions of the logic analyzer and pattern generator (thestructure and function of which will be discussed below).

[0113] A smaller version of the presently preferred emulation system canalso be constructed. A block diagram of this system is shown in FIG. 14.The smaller system does not have a switching backplane 420. Instead,logic boards 200 are connected directly together and to I/O boards 300.This is possible because-the size of the system is limited- to two pairsof logic boards 200 and one pair of I/O boards 300. The backplaneconnections are shown at the top of FIG. 14. Pins from each backplaneconnector 220 (shown in FIG. 10) are divided in to 4 equal groups. Eachgroup is routed through the backplane to one of the two logic boards 200not in the same pair and to each I/O board 300. It is not necessary tomake connections through the backplane to the other logic board 200 orI/O board 300 in the same pair because this connection is providedthrough the turbo connector 202 in the case of logic boards 200 and isnot necessary in the case of I/O boards 300. The connection patternshown in FIG. 14 provides sufficient richness for good routability IObetween boards but avoids the high cost of a switching backplane 420. Asin the large system, I/O board 300 is connected through core board 500and repeater pod 520 to an external system 40. To simplify the drawing,core board 500, repeater pod 520, and external system 540 are not shownfor the second I/O board although they are, in fact, present. By using aset of additional boards to make connections between otherwise unusedbackplane and turbo connectors, versions of the small system may beconstructed with one to four logic boards 200 and either one or two I/Oboards 300. These additional boards are turbo loopback board 260 andbackplane loopback board 280 shown in FIG. 15. Neither of these boardshave any digital logic on them. They simply route signals betweenconnectors.

[0114] A physical drawing of the small system with one logic board andone I/O board is shown in FIG. 15. Backplane 802 provides theconnections described earlier with reference to FIG. 14. I/O board 300is connected through connector 330 to core board 500. Core boards 500have an external connector 510 which attaches through a cable torepeater pod 520 and external system 40 (not shown in FIG. 15). A powerboard 240 converts from a forty-eight volt DC main power supply to the3.3 volts necessary to power the logic board. In addition, the smallsystem contains a control board 600 and a CPU board 700 as in the largesystem described earlier with reference to FIG. 13. To preserve routingconnections when less than four logic boards are used, turbo loopbackboard 260 connects signals from unused turbo connectors 202 (shown inFIG. 11) of logic board 200 to the backplane 802. The turbo loopbackboard 260 is used when there are either one or three logic boards 200 inthe system. An additional pair of backplane loopback boards 280 are usedto preserve routing connections through the backplane when there areunused logic board slots. This occurs when there are either one or twologic boards in the system. The backplane loopback boards 280 connectthe groups of backplane signals (shown in FIG. 14) to each other so thatno signals are lost when there are otherwise vacant backplaneconnectors.

[0115] A block diagram of I/O board 300 and core board 500 is shown inFIG. 16. A first row 301 of Mux chips 12 is attached to backplaneconnector 320 on I/O board 300. To simplify the drawing, only three muxchips 12 are shown in the first row 301. In a presently preferredembodiment, however, there are fourteen Mux chips 12 in the first row301. A second row 303 of mux chips 12 connects to the first row 301 ofMux chips 12 as well as to field effect transistors (FETs) 308 and logicchips 304. Again, the drawing has been simplified to only show two Muxchips 12. In a presently preferred embodiment, there are twelve Muxchips 12 in the second row 303. Two rows 301, 303 of Mux chips 12 arerequired to achieve sufficient routing flexibility so that any arbitraryexternal signal can be connected to any pin of repeater cable connectors510 on core board 500. Logic chip 304 also is attached to synchronousgraphics RAM (SGRAM) 302. In a presently preferred embodiment, logicchip 304 is a FPGA. Although only one logic chip 304 and one SGRAM 302is shown, in a presently preferred embodiment, there are six logic chips304 and three SGRAMs 302 on I/O board 300. Logic chips 304 and SGRAMs302 provide the capability of driving stimulus vectors into the emulatoron any external connection pin. When driving stimulus vectors, FETs 308are turned off (i.e., are opened) so the stimulus will not conflict withsignals from an external system which may be attached through repeaterpods 520 to connectors 510. When not driving stimulus vectors, pins oflogic chips 304 are tristated and FETs 308 are turned on (i.e., closed)so that signals on connectors 510 may drive or receive signals fromsecond row mux chips 303. In the presently preferred embodiment, logicchips 304 are the XC5215, which is available from Xilinx Corporation,San Jose, California, although other programmable logic chips could beused with satisfactory results. In addition to the components shown inFIG. 16, the I/O board 300 contains a processor chip (not shown) whichis connected through a VME interface to backplane connector 320. In apresently preferred embodiment, this processor chip is a PowerPC 403GCfrom IBM corporation, although other microprocessor chips could be usedwith satisfactory results. The processor chip attaches through processorbus 3 10 to logic chips 304. Processor bus 310 serves to upload stimulusinformation into SGRAMs 302. The processor is used for diagnosticfunctions and for uploading and downloading information from Mux chips12, logic chips 304 and SGRAMs 302.

[0116] Connector 330 attaches core board 500 to I/O board 300. Inaddition to logic signals coming from FETs 308, this connector 330receives JTAG signals and is electrically connected to the VME bus. TheJTAG signals are for downloading and testing repeater pods 520 which maybe plugged into connectors 510. In a presently preferred embodiment, theVME bus is not used with core board 500. However, it is contemplatedthat the VME bus could be used with other types of boards which may beplugged into connector 330. For example, it is contemplated that a largememory board might be plugged into connector 330 to provide the abilityto emulate memories larger than will fit into RAMs 208 (shown in FIG.11).

[0117] Referring now to FIG. 17, a block diagram of mux board 400 isshown. Mux chips 12 attach to backplane connector 420 in a distributedfashion. The drawing of FIG. 17 has been simplified to only show fourMux chips 12. However, in a presently preferred embodiment, there are,in fact, seven Mux chips on mux board 400. Furthermore, there are manymore connections to Mux chips 12 than are shown in FIG. 17. Theseadditional connections are arranged similarly to the ones shown. Inaddition to the Mux chips 12 shown in FIG. 17, mux board 400 contains aJTAG interface (not shown) attached to backplane connector 420 whichallows the Mux chips 12 to be downloaded and tested.

[0118] The mux board of FIG. 17 is suitable for a non-expandableemulation system. It is often desirable, however, to connect severalemulation systems together to form a larger capacity emulation system.In this case, an expandable version of mux board 400 is used. A blockdiagram of an expandable mux board 402 is shown in FIG. 18. A first row404 of Mux chips 12 is electrically connected to backplane connector420. The drawing has been simplified to show only four Mux chips 12 inthe first row 404. However, in a presently preferred embodiment, thereare ten Mux chips 12 in the first row 404. The first row 404 of Muxchips 12 are electrically connected to a second row 406 of Mux chips 12and to a turbo connector 430. The second row 406 of Mux chips 12 is alsoelectrically connected to turbo connector 430 and to external connectors440. Only two Mux chips 12 are shown in the second row 406, and only twoexternal connectors 440 are shown in FIG. 18. However, in a presentlypreferred embodiment, there are five Mux chips 12 in the second row 406.Furthermore, there are six external connectors 440 in the presentlypreferred embodiment. Each external connector 440 of the presentlypreferred embodiment has ninety-two I/O pins. Mux boards 402 areassembled into pairs which are attached together through turbo connector430. Turbo connector 430 acts to expand the effective intersection areabetween a pair of mux boards 402 and a pair of logic boards 200. Withoutthe turbo connector 430, the intersection area is too small foreffective routability between external connectors 440 and logic boards200.

[0119] With reference to FIG. 19, the manner in which user clocks aredistributed in the emulation system is described. Distribution of userclocks is important in emulation system design. As is discussed in U.S.Pat. No. 5,475,830, it is necessary to ensure that user clocks arrive atthe logic chips 10 on emulation boards 200 before data signals, assumingthat the user clocks and data signals change at the same time inexternal system 540 (external system 540 is shown in FIG. 12 and 14). Itis possible to satisfy this requirement by delaying the data signals.This solution, however, slows down the maximum operating speed of theemulation system. A more desirable alternative is to make the user clockdistribution network as fast as possible so that minimal, if any, delayneeds to be added to the data signals.

[0120]FIG. 19 shows the clock distribution for a preferred hardwareemulation system. Clocks may enter the system either through a clockconnector 620 on control board 600, through multi-box clock connector630 on control board 600, or as a normal signal on connector 510 of coreboard 500. As discussed, core board 500 is attached to I/O board 300.For simplicity, only one connector 510 is shown in FIG. 19. However, ina presently preferred embodiment, there are seven connectors 510 on eachcore board 500. Also, the system may contain multiple I/O board/coreboard combinations. As described earlier with reference to FIG. 12 and14, connector 510 attaches to repeater pod 520 which connects to anexternal system 540. If clock connector 630 is used to input clocks,connector 620 will be also attached through a cable to external system540. Clock connector 620 provides a faster method for clocks to enterthe emulation system while connectors 510 on core boards 500 provide aneasier method for the user.

[0121] Connector 5 10 on core board 500 connects through connector 330and FETs 308 to a second row 303 Mux chip 12 on I/O board 500 asdescribed earlier with reference to FIG. 16. Second row 303 Mux chip 12connects to dedicated clock pins on backplane connector 320 in additionto other connections described earlier. In a presently preferredembodiment, there are sixteen of these pins. From I/O board backplaneconnector 320, clocks connect through backplane 800 or 802 to controlboard 600 (see FIG. 13). On control board 600, a Mux chip 12 is used toselect a combination of clocks from all of the different potentialsources. The system may have up to thirty-two distinct clock sources.Any eight of these may be used on a pair of emulation boards 200. Thisallows different pairs of emulation boards 200 to have different clocksas might be required, for example, when more than one chip design wasbeing emulated in a single hardware emulation system. Clocks are routedthrough programmable delay element 604 and buffers 614 then throughbackplane 800 or 802 to emulation boards 200. As described earlier withreference to FIG. 11, clocks on emulation board 200 may be routed eitherthrough buffer 216 or clock generation logic chip 204 (i.e., CoSim logicchip) before going to logic chips .

[0122] Logic analyzer clock generator logic chip 602 on control board600 may also generate clocks. This typically happens when running thesystem with test vectors. Data from clock RAM 612 is input to a statemachine programmed into logic analyzer clock generator logic chip 602which allows different clock patterns to be created such asreturn-to-zero, non-return-to-zero. two-phase non-overlapping, etc.Design of such a state machine is well understood to those skilled inthe art of control logic design and will not be further described here.From logic analyzer clock generator logic chip 602, the thirty-twogenerated clocks are communicated to the clock selection Mux chip 12. Ina presently preferred embodiment, logic analyzer clock generator logicchip 602 is an XC4036XL device manufactured by Xilinx Corporation,although other programmable logic devices could be used withsatisfactory results.

[0123] Multi-box clock connector 630 may serve either to input clocks orto output clocks. Direction is controlled by buffer 608. In a multi-boxemulation system, i.e., an emulation system comprised of more than onestand-alone emulation system, one box is designated as the master andthe others are designated as slaves. The master box produces the clockson its multi-box clock connector 630 which are then input to all otherslave emulation systems through their multi-box clock connectors 630. Ina multi-box system, delay element 604 is programmed in the master box tocompensate for the inevitable cable delays between the master and slaveboxes.

[0124] It will be recognized by one skilled in the art that FIG. 19 hasbeen considerably simplified for clarity and that there are a largenumber of interconnections and components not shown. The need for theseadditional components and interconnections are a matter of designchoice.

[0125] Referring now to FIG. 20, the control structure of the hardwareemulation system will be discussed. Previous hardware emulation systemshave generally suffered from insufficient processing capability. Thisresulted in long delays when transferring data to or from the system,when loading design data into the system, and when running hardwarediagnostics. In a preferred embodiment of the present invention, a twolevel processor architecture is used to alleviate this problem. A mainprocessor 700 is attached to control board 600. In a presently preferredembodiment, processor 700 is a Power PC VME based processor cardavailable from Themis Computer, although other similar cards could beused with satisfactory results. Processor 700 is electrically connectedto the Ethernet and to VME bus 650 on control board 600. VME bus 650 iselectrically connected through an interface (not shown in FIG. 20) tobackplane 800 or 802. and then to logic boards 200 and I/O boards 300.VME bus 650 also connects through JTAG interface 660 on control board600 and backplane 800 to the mux boards 400.

[0126] Each logic board 200 and I/O board 300 has a local processor witha VME interface and memory. This circuit will be discussed withreference to logic board 200 although a similar circuit exists on eachI/O board 300. Processor 206 (shown earlier on FIG. 11) is electricallyconnected through VME interface 222 to VME bus 650 on backplane 800 or802. It is also electrically connected to a Controller 221. In apreferred embodiment, controller 221 is comprised of several XC5215FPGAs from Xilinx Corporation. Controller 221 provides JTAG testingsignals to other components on logic board 200. In addition, variousdevices such as flash EEPROM 224 and dynamic RAM 226 connect toprocessor 206. Processors 206 can operate independently when doing boardlevel diagnostics, loading configuration data into logic chips 10 ortransferring data to and from memories 208 and 210 (shown earlier onFIG. 11).

[0127] Referring now to FIG. 20a, the logic analyzer circuit for thepreferred embodiment system will be discussed in detail. The logicanalyzer is distributed. This means that portions of the logic analyzerare contained on each logic board 200 while centralized functions arecontained on control board 600. Events, i.e., combinations of signalstates in the design undergoing emulation, are generated inside thelogic chips 10 and 204 on the logic boards 200. These are combined inpairs and output on signals 236, which are then ANDed together in aspecial event logic chip 232 (shown in FIG. 20a as AND gate 232). Theresulting combined event signals are separated into eight signals byflip-flops 230 (for simplicity, only two flip-flops 230 are shown inFIG. 20a). Separated event signals 240 then go through the backplane 800or 802 (not shown in FIG. 20a) to the control board 600 where they areagain AND'ed by AND gate 678 (which is part of a logic chip) with eventsfrom other boards or other boxes. Connector 670 may contribute eventsignals from other emulation boxes. The final event signals go to thetrigger generator logic chip 674 on the control board 600 which computesa trigger condition and conditional acquisition condition and generatesan acquire enable signal 238 which controls acquisition of data on thelogic boards 200. The output of the trigger generator logic chip 674 issent through buffer 671 to connector 672 and through delay element 676.The output of delay element 676 is buffered by buffers 673 and sentacross backplane 800 or 802 to a logic analyzer memory controller 234 onlogic boards 200. The control board 600 also generates the trace andfunctional test clocks and other logic analyzer/pattern generatorsignals.

[0128] Referring now to FIG. 20b, the data path for logic analyzersignals is shown. Data signals are latched in the logic chips 10 and 204and scanned out into synchronous graphics RAMs (SGRAMs) 210 on theemulation boards 200. The logic analyzer data path is distributed acrossall the logic boards 200. Each Mux chip 12 on the logic boards 200 haseight pins connected to a 256K×32 SGRAM 210. The SGRAM 210 operates athigh speed while the emulation is running to save logic analyzer data.Data is time-multiplexed anywhere from two-to-one to sixty four-to-one,depending on the desired logic analysis speed, channel depth and numberof probed signals as shown in the chart below: Logic Analyzer TradeoffsMax Speed Depth Channels/Logic Board Time-mux factor 16 MHZ 128K  864 2-1 8 MHZ 64K 1,728  4-1 4 MHZ 32K 3,456  8-1 2 MHZ 16K 6,912 16-1 1MHZ  8K 13,824 32-1 .5 MHZ  4K 27,648 64-1 (All Signals)

[0129] The maximum speed numbers shown above are approximate and willvary depending on the logic analyzer design and the multiplexing clockspeed.

[0130] At a 0.5 MHZ rate, a sufficient number of channels are availableso that it is possible to probe every flip-flop or latch in the emulateddesign simultaneously. When a signal is “probed”, the value of thesignal at that element or node is read. Generally, this value is thenstored in a memory element (SGRAM 210). By reconstructing combinationalsignals in software, the user can view any set of signals for severalthousand clocks around a trigger condition without moving probes or evenrestarting the emulator. When it is desired to probe a combinationalsignal, the software examines the design netlist. A cone of logic isextracted in which each combinational logic path leading to the desiredsignal is traced backwards until it terminates either at a probedstorage element (i.e., a flip-flop or a latch) or at an external inputof the design. The logic function for the desired signal is then derivedin terms of all the storage elements or external inputs contributing toit. Finally, the value of the desired signal is calculated for eachinstant of time by evaluating the logic function using the previouslysaved values for all storage nodes and external inputs. The logicfunction is evaluated at each point where one of the inputs to the logiccone changes. This is done as part of the design debug software.

[0131] For example, in FIG. 20d, probed signal E can be calculated byextracting its combinational logic cone which terminates at storageelements B, C, D and design input A. The equation for signal E isevaluated whenever signals A, B, C, D change. A waveform for signal Ecan then be displayed exactly as if a physical probe were placed on it.This full visibility greatly speeds up debugging for complex designproblems. Full visibility can also be available at a higher frequency ifthe number of flip-flops per logic chip 10 or 204 is limited.

[0132] At higher speeds, i.e., speeds higher than 0.5 MHZ, the user mustspecify which signals to probe. However, because each logic board 200has its own logic analyzer memories 210, changing the signal beingprobed is fast. The reason for this is that probes do not need to berouted over the backplane, as in prior art emulation systems. Referringagain to FIG. 20b, inside each logic chip 10 or 204, an additional logiccircuit 2000 is added to the user's design which is programmed intologic chips 10 or 204. If a custom designed logic chip is used, thislogic circuit 2000 could be designed (i.e., hard-wired) into the chip. Anumber of dedicated scan registers are added depending on the number ofsignals to be probed. The maximum depth of the scan registers isdetermined according to the table above. Each dedicated scan register isalso known as a scan chain. Disposed between each scan flip-flop 2004 isa two-to-one multiplexer 2005. The output of each multiplexer 2005 feedsthe input D of the scan flip-flop 2004 which follows it. The first inputto each multiplexer 2005 is provided by a node in the user's design. Thesecond input to each multiplexer 2005 is provided by the output Q of thepreceding scan flip-flop 2004. The select input to the multiplexers 2005is trace clock 2002, the function of which is discussed below. The scanflip-flops 2004 are clocked by the Mux Clock Signal 44. Together, aseries of scan flip-flops 2004 and multiplexers 2005 form a scanregister or scan chain. Depending on the length of scan chains and thenumber of signals to be probed, each logic chip 10 or 204 will havezero, one, or a plurality of scan chains. The number of scan chains in agiven chip depends on the number of flip-flops or signals to be probed.As explained later, the software will assign signals to scan chains tominimize the number of chains and simplify the chip routing. In apreferred embodiment, a maximum of twelve scan chains and twelve I/Opins per logic chip 10 or 204 are required in order to probe allflip-flops or latches in an-emulated design. To achieve the fastestpossible logic analyzer operating speed, the scan chains and SGRAMs 210operate at twice the time-multiplexing frequency. A bit of data isoutput on each scan output pin 2006 for every cycle of thetime-multiplexing clock.

[0133] Referring now to FIG. 20c, logic analyzer events are alsodistributed on the logic boards 200. This avoids the need to routedesign signals contributing to events over the backplane 800 or 802.Events are detected using additional dedicated logic 2000 inserted intoeach logic chip 10 or 204 on the logic boards 200.

[0134] Signals contributing to events are latched by the same scanflip-flops 2004 used for logic analyzer data and previously shown inFIG. 20b. These signals are then routed to JTAG programmable edgedetectors comprising CLB memories 2018 (CLB memory is memory availableon the logic chips 10, 204) which are then AND'ed together using wideedge decoder 2012 to form eight event signals. The eight event signalsinside each logic chip 10, 204 are combined two to a pin usingmultiplexer 2020 and output to the emulation board as event signals 236(also shown in FIG. 20a) where they are again AND'ed with the eventsignals from other FPGAs. The board level event signals are transmittedover the backplane to the control board where they are AND'ed with eventsignals from other emulation boards and other boxes. The resultingsystem wide event signals go the trigger logic chip 674 on the controlboard where they are used to generate an acquisition enable and otherlogic analyzer control signals.

[0135] Signals contributing to events may be defined by the user of theemulation system before compilation by filling out a form that isdisplayed to the user prior on the workstation connected to theemulation system. If this is done, sufficient configurable logic blocks(CLBs) in the logic chips 10, 204 (CLBs are the logical building blocksused to implement functionality in logic chips 10, 204) will be reservedduring the compilation process to allow all the necessary event logic tofit. Any number of signals can be predefined with only a minimal impacton capacity (approximately four CLBs per signal). New signals can alsobe added after the full compile is complete. This will require anincremental recompilation and redownload to create additional edgedetectors and route the new signals. Once all signals contributing toevents have been defined, the user has total flexibility to change eventconditions on the fly while the emulation is running. Breakpoints,trigger conditions and conditional acquisition conditions can bemodified and the logic analyzer restarted without stopping theemulation. This is made possible by using JTAG programming to set up theevent logic.

[0136]FIG. 20c shows an logic chip 10 or 204 with all the event and scanlogic inserted. The design is divided into scan registers comprised ofscan flip-flops 2004 and multiplexers 2005, event register comprised offlip-flops 2010, a JTAG interface 2016 and 2014, a set of edge detectors2018 and wide edge decoder 2012.

[0137] Event signals cannot be saved in the scan flip-flops 2004 becausethe contents change as the logic analyzer data is shifted out. Thus,event flip-flops 2010 are used to remember the current and previousstate for all signals contributing to events. The event register 2010 isclocked once on the next scan clock after the scan register 2004 hasbeen loaded by Trace Clock Signal 2002 (discussed below). Alternatively,the scan register 2004 could be a parallel shadow register and tristatebuffers could be used to load the scan data onto the scan output pins.

[0138] Outputs from the event flip-flops 2010 are used as inputs to theedge detectors 2018. Edge detectors 2018 are comprised of dual port CLBmemories. Each CLB memory is loaded to perform the desired level/edgedetection for two input signals and produces one event output. Theoutputs from all the CLB memories belonging to one event are AND'edtogether using the built-in wide decoders 2012 to form one event signalfor this logic chip 10. Event signals are then combined using amultiplexer 2020 and output to a tristate buffer 2022 at the I/O pin.Every time a user signal is needed for any event, it is attached to alleight events so event definitions can be changed at run time.

[0139] The CLB memory used in edge detector 2018 is programmed over theJTAG bus. This is done with a counter 2016 and decoder 2014 by using thedual port memory feature of the preferred embodiment logic chip 10, 204.For large numbers of event circuits, creating and routing select signalsfrom decoder 2014 can take a significant fraction of the logic chip 10gate capacity. As an alternative, a shift register can be createdcontaining all edge detector memories 2018. This alternative, however,prevents random access.

[0140] Each signal contributing to an event requires approximately fourCLBs plus a small amount of overhead for the JTAG interface. It isassumed that whenever a signal is added, the necessary logic is insertedto allow it to be used as part of any or all of the eight events. If theuser specified exactly which event the signal was to be used for, onlyone-half of a CLB would be required, but this would significantlyrestrict the ability to make changes to event conditions while theemulation was running.

[0141] The edge detection memory 2018 for each signal/event combinationis programmed to detect one of the following conditions: EventConditions Equation Mnemonic Description A = 0 0 0 Level A = 1 1 1 LevelA = 0 & B = 1 F Falling Edge A = 1 & B = 0 R Rising Edge A xor B E AnyEdge A = 0 & B = 0 S0 Stable at a 0 A = 1 & B = 1 S1 Stable at a 1 Axnor B S Stable at a 1 or 0 0 — Don't use signal

[0142] A logic analyzer cycle starts with the Trace Clock Signal 2002.Trace Clock 2002 is not a tightly controlled signal. It is onlyguaranteed valid at the rising edge of the Mux Clock Signal (MUXCLK) 44.Trace Clock 2002 causes a synchronous sample of data to be saved in allthe scan chains. It also starts the event computation. The board levelevents are sent to the control module 600 where they are AND'ed togetherand used to control the trigger generator state machine 674. Afterseveral trace clock periods, the trigger generator produces an AcquireEnable signal 238 that controls writing of data to the SGRAM 210 onlogic boards 200. The circuit then remains inactive until the next TraceClock 2002.

[0143] Logic analyzer data is stored in RAMs on each emulation board. Asstated earlier, each logic board 200 contains fifty-four mux chips 12,each of which has eight pins connected to an SGRAM 210. Thus, there are54*8=432 data channels in the RAM. Logic analyzer data is stored inbasic units called frames. A frame is generated following each traceclock 2002 and consists of all the data shifted out once from the logicchip 10 or 204 scan chains. A frame may fill from two to sixty-four RAMlocations and take two to sixty-four Mux Clock signal (MUXCLK) cycles togenerate. A typical frame looks as follows: Data Channels (432) Frame 0Data 0 Data 1 Data 2 Data 3 Frame 1 Data 0 Data 1 Data 2 Data 3

[0144] A minimal frame would take only two RAM locations. Frame lengthis always a multiple of two. Therefore, legal lengths are two, four,eight, . . . sixty-four RAM locations. To meet the SGRAM 210 timingrequirements, sequential writes within a frame are done into oppositebanks of the memory. For the minimum size frame, one word of data isstored in the low RAM bank and one word in the high RAM bank.

[0145] Logic board memory is 256K words deep. The memory is dividedequally into thirty-two self-contained blocks, each of which has 8192words and may include between 4096 and 128 frames depending on the framelength. Blocks are fixed length and always start on 8K word boundaries.Within a block, frames may be stored in random order but there is nooverlap of frames between blocks. All frames from a later block willhave a higher timestamp value than all frames from an earlier block.

[0146] The depth of logic board memory 210 is dependent on the designerschoice and the depth of memory chips available. Deeper memories may beused in the future as larger SGRAMs become available.

[0147] A timestamp value is saved in a clock RAM 612 (shown in FIG. 19)on the control board 600 each time a frame is saved on the logic boards200.

[0148] The logic analyzer supports a conditional acquisition option.This means that individual frames may or may not be written into memorydepending on the value of one of the event signals and/or the currentstate of the trigger state machine. Conditional acquisition allows moreefficient use of the memory since only significant data is saved.Conditional acquisition is controlled by an Acquire Enable Signal 238generated on the control board 600. There is a pipeline delay ofapproximately four trace clocks after a trace clock 2002 to generate theAcquire Enable Signal.

[0149] Because of the delayed Acquire Enable Signal, it is not possibleto determine at the time data is available whether it is supposed to besaved or not. Data is, therefore, always saved into memory andoverwritten later if the delayed Acquire Enable Signal shows that it wasnot good. This results in the data being saved into memory inessentially random order. The correct data order is recovered after thelogic analyzer stops by sorting the timestamps saved in clock RAM 612and distributing a set of pointers to each logic board processor 206.The pointers show the physical memory location of each sequential datasample. The out-of-order data is limited to one block of the memorybecause it is necessary to handle wraparound of the memory addresscounter. The oldest block of data must be discarded as soon as theaddress counter writes again into the first location of the block.

[0150] The logic analyzer control logic chip 674 on the control boardalso has a Block Register in which five bits of data are saved aftereach block is written (one-hundred sixty bits total). Four of these bitsare the value of the Acquire Enable Signal for each of the last fourframes written. One extra bit specifies whether the block was written insorted order. This is equivalent to saving that Acquire Enable was validfor each trace clock during the block.

[0151] To force blocks not to overlap, the last four frames in eachblock will always be written, regardless of the state of the AcquireEnable Signal. These last four frames may or may not contain good data.The control module processor examines the corresponding Acquire Enablebits in the Block Register to see whether the data is good or not. Thenumber of actual data frames in a block may, therefore, vary by four.

[0152] This needs to be taken into account when creating the set ofpointers for the emulation boards. The last four words of data savedbefore the logic analyzer stops also may or may not contain good data.This can be determined by flushing the Acquire Enable pipeline into theBlock Register after the logic analyzer stops.

[0153] The control module processor 700 is able to read the address ofthe last frame stored before the logic analyzer was stopped from logicanalyzer control chip 674. This is used to determine the last data blockwritten. The first data block is either block 0 if the address counterdid not overrun or the next higher block. One additional status bit isnecessary which is set when the address counter overruns for the firsttime.

[0154] The last data block being written when the logic analyzer stoppedwill probably contain some old frames written during the previouswrap-around of the address counter. These must be discarded. The framesto be discarded can be determined by sorting with the timestamp valueand discarding any frames that have a timestamp earlier than theearliest timestamp in the first data block.

[0155] For example, assume that the frame length was one (instead of twoto sixty-four), there were eight frames per block (instead of 4096) andthe memory had a depth of twenty-four (instead of 262,144). The logicboard and control board memories might have the following data after thelogic analyzer stopped: Logic Board Control Board Block Register AddressData Memory Timestamp Sorted Acq. Enable 0 28 43 0 0111 1 18 47 2 <-Counter 17 45 3 92 4 4 93 5 5 94 6 6 95 7 7 96 8 8 3 13 0 0101 9 1 10 102 12 11 5 27 12 7 29 13 14 30 14 8 31 15 27 32 16 3 33 1 0111 17 9 34 1810 35 19 11 37 20 12 39 21 13 40 22 14 41 23 17 42

[0156] The address counter stopped at location 2 and the AddressOverflow bit is set. This means that the block from location 0 to 7 isthe last block and the block from location 8 to 15 is the first block.By looking at the Acquire Enable bits stored for the first block, it canbe determined that the frames at the end of the first block at locations12 and 14 are good and the frames at the end of the first block atlocations 13 and 15 are bad. All other frames in the block are good,otherwise the address counter would not have incremented to the nextblock. After sorting by timestamp and removing the bad data, the firstblock is: Emulation Board Control Board Address Data Memory Timestamp 91 10 10 2 12 8 3 13 11 5 27 12 7 29 14 8 31 # removed either before orafter sorting by the timestamp.

[0157] The second block is processed next. The frame at address 23 isbad in the second block starting at address 16. The block does not needto be sorted because the Sorted bit for this block is set in the BlockRegister. After removing the bad frame, the block looks like: EmulationBoard Control Board Address Data Memory Timestamp 16 3 33 17 9 34 18 1035 19 11 37 20 12 39 21 13 40 22 14 41

[0158] The last block, starting at address 0 is now processed. First theframe is sorted by timestamp to give: Emulation Board Control BoardAddress Data Memory Timestamp 3 92 4 4 93 5 5 94 6 6 95 7 7 96 8 0 28 432 17 45 1 18 47

[0159] Next, all frames with timestamps earlier than the firsttimestamp, in the first block (10) are discarded. This leaves only threeframes in the block. Emulation Board Control Board Address Data MemoryTimestamp 0 28 43 2 17 45 1 18 47

[0160] The Block Register Acquire Enable bits for the last frame containthe last values from the Acquire Enable pipeline. The register contentsfor this block are 0111. This means that the last frame at address 1 isbad and the other two frames at address 0 and 2 are good. The low orderbit is meaningless since only three frames have been written to thisblock. The last block then looks like: Emulation Board Control BoardAddress Data Memory Timestamp 0 28 43 2 17 45

[0161] and the complete set of recovered data is: Emulation BoardControl Board Address Data Memory Timestamp 9 1 10 10 2 12 8 3 13 11 527 12 7 29 14 8 31 16 3 33 17 9 34 18 10 35 19 11 37 20 12 39 21 13 4022 14 41 0 28 43 2 17 45

[0162] The software required to program the preferred embodiment systemwill now be discussed. The software is updated from, and thereforedifferent than the software previously disclosed in U.S. Pat. Nos.5,109,353, 5,036,473, 5,448,496 and 5,452,231 and 5,475,830, thedisclosures of which are all incorporated herein by reference. A flowdiagram is shown in FIG. 21.

[0163] The source netlist could be directly imported by the netlistimporter 1000, produced by a logic synthesis program 1002 such asHDL-ICE™ brand logic synthesis software available from Quickturn DesignSystems, Inc., or generated by behavioral testbench compiler 1004.Netlist importer 1000 is capable of taking gate lever text netlists in avariety of formats such as EDIF and Verilog and converting the netlistsinto an internal database netlist format which is represented bydatabase logical libraries that contain hierarchically defined cells,generic cells, and special hardware cells. Special hardware cellsinclude memory specification cells, microprocessor cells, and componentadaptor cells. Some of the hierarchically defined cells have a flag thatprevents them from being flattened and split among several logic chips10 to avoid timing problems when routing between chips. The choice anddesign of netlist import software is a matter of design choice and willnot be discussed further. As discussed, a flattened cell is one whichcontains no hierarchical cells. It only contains the most primitivecomponents such as simple logic gates.

[0164] HDL-ICE™ brand logic synthesizer 1002, which is the presentlypreferred logic synthesizer 1002, takes register-transfer-level (RTL)Verilog or VHDL netlists and converts them through a logic synthesisprocess into the database format used by the netlist importer and othercompilation steps. Other suitable synthesis products are commerciallyavailable from Synopsis Corporation and others, although the HDL-ICE™brand logic synthesizer has some advantages such as better integrationand higher operating speed.

[0165] Behavioral testbench compiler 1004 allows behavioral testbenchesdescribed in Verilog or VHDL to be emulated. Code executing in parallelon processors 206 of one or more logic boards 200 is tightly coupledthrough co-simulation logic chip 204 to other logic which may comethrough netlist import program 1000 or HDL-ICE™ brand logic synthesizer.Code executing on processors 206 may be a behavioral (non-synthesizable)representation of a logic design while other logic is in a gate level,(synthesizable) RTL representation.

[0166] Logic cell memory (LCM) generator 1006 replaces memoryspecification cells from the user's design that will be implementedusing memories built into the logic chips 10, with hierarchicallydefined cells (hard macros) that define memory cell implementationincluding, possibly, the mapping to configurable logic blocks within thelogic chips 10 and their relative location inside each logic chip 10.

[0167] User data input program 1008 allows the user to enter informationnecessary for the design compilation, such as clock information, probeinformation, special net information, etc. This information aids theemulation system in handling certain conditions that can cause problemsduring the emulation if not handled in a special manner.

[0168] Data qualification program 1010 verifies correctness of thenetlist and user data. It finds common netlist errors such as undriveninputs or multiple outputs attached to a net.

[0169] Clock tree extraction program 1012 extracts the clock tree fromhierarchical netlist and identifies clock terminals on all levels ofdesign hierarchy. A description of the operation of this step isdisclosed in detail in U.S. Pat. No. 5,475,830.

[0170] Hierarchical partition planning program (HPP) 1014 is used forthe physical module chip partitioning algorithm. It identifies theportions of the design to be mapped to each logic board 200.

[0171] Partition DB setup 1016 prepares the database for parallelexecution of the chip partitioning program for each portion identifiedby HPP 1014.

[0172] Chip partitioning program 1018 identifies the clusters of logicto be implemented in each separate logic chip 10.

[0173] NGD Out program 1020 creates NGD files corresponding to each chipbased on the results of chip partitioning. NGD is a file format commonto various software programs available from Xilinx Corporation. NGDfiles contain logic and routing information necessary to implement alogic design into logic chip. As discussed, in the presently preferredembodiment, logic chips from Xilinx are utilized. The NGD Out program1020 translates database information into the NGD format. NGD Outprogram 1020 also starts parallel partition, place and route (PPR) jobs1022 for the individual logic chips IO with an arbitrary I/O pinassignment. PPR program 1022 is a program commercially available fromXilinx Corporation which produces programming files for the FPGAs Xilinxmanufactures.

[0174] Physical DIB Generation program 1026 prepares the physicaldatabase to be used by board partitioning program. The physical databasecontains information about the physical connections between logic chips10 and Mux chips 12 for each board in the system.

[0175] Board partitioning program 1028 identifies the placement of logicgates into logic chips within each pair of logic boards 200. Itconsiders the limitations on memory instances that can be implemented oneach logic board 200, the logic analyzer probe channels limitation, theone microprocessor per board limitation as well as backplane and turboconnector limitations.

[0176] EBM compilation program 1030 combines all remaining memoryspecification cells assigned to the same logic board 200 into no morethan twelve groups corresponding to the RAMs 208 (previously shown onFIG. 11). The I/O signals that connect to SRAM chips 208 are marked withcorresponding pin numbers.

[0177] System routing module 1032 selects the physical nets andtime-division multiplexing (TDM) phases to implement logical nets thatcross the chip boundaries. It assigns pin numbers and TDM phases to allchip I/O pins. It also produces the programming data for Mux chips 12and repeater pods 520.

[0178] NGD update program 1034 starts final incremental PPR jobs 1036for each logic chip 10 providing the final connectivity of TDM logic andI/O assignment. When the jobs are successfully completed, thecompilation is finished.

[0179] Details of the functionality of the various programs will now bedescribed further.

[0180] Referring to FIG. 22, the sequence of steps necessary for thecompilation of a software-hardware model created by behavioral testbenchcompiler 1004 is shown. Compilation starts from the user's source codein Verilog or VHDL. As a result of an import process 1100, thebehavioral database representation 1102 is created. After modelcompilation is finished, it results in a logic representation of anemulation model 1114 and a set of executables 1112 downloadable intologic module processor DRAMs 226 previously shown on FIG. 20.

[0181] The behavioral testbench compiler software 1004 includes fourexecutables and a runtime support library.

[0182] The importer 1100 processes the user's Verilog or VHDL sourcefiles and produces a behavioral database library 1102. It accepts a listof source file names and locations and file names for libraries wherethe otherwise undefined module references are resolved. The source filenames are the file names used by Verilog or VHDL.

[0183] The preprocessor 1104 transforms the behavioral database library1102 created by importer 1100 into a new behavioral database library1106. It performs partitioning of the behavioral code into clusters(also referred to as partitions) directed for an execution on each ofthe available processors 206 (see FIG. 11) and determines the executionorder of the code fragments, and the locality of variables in thepartitions.- Code fragments are independent pieces of code which can beexecuted in parallel on processors 206. Also, the preprocessor does allthe transformations necessary for creation of hold time violation freemodel. See, for example, U.S. Pat. No. 5,259,006 to Price et al, thedisclosure of which is hereby incorporated by reference in its entirety.

[0184] The code generator 1110 reads the behavioral database library1106 as transformed by the preprocessor 1104 and produces downloadableexecutables for each of the clusters identified by the preprocessor1104. These executables will be downloaded into DRAMs 226 for executionon processors 206.

[0185] The netlist generator 1108 reads the behavioral database libraryas transformed by the preprocessor 1104 and produces a logical databaselibrary 1114 for further processing by the other compiler programs1006-1036. To represent special connections of the co-simulation logicchip 204 to the microprocessor bus and event synchronization bus (seeFIG. 11), the netlist generator 1108 will create the netlist structureshown in FIG. 23. MP Cell 1200 is a special cell corresponding toprocessor 206 which will not be clustered by chip partitioning program1018 (similar to the LBM cell instances). Peripheral controller cell1202 is a regular cell that contains library component instances andwill be placed into co-simulation logic chip 204. Only a minimal amountof logic will be placed into this cell 1202 that directly interacts withthe microprocessor bus. Placing minimal amounts of logic into theperipheral controller cell 1202 prevents the need for wait stateprogramming. Peripheral controller cell 1202 will be flagged to preventchip partitioning program 1018 from splitting it among several logicchips 10. It is a responsibility of netlist generator 1108 to make surethat the capacity of this cell does not exceed the capacity of a singlelogic chip 204 and that the number of connections between this cell andthe rest of the netlist does not exceed the number of connectionsbetween co-simulation logic chip 204 and Mux chips 12. As discussedpreviously, co-simulation logic chip 204 has three pins electricallycommunicating with each of fifty-four Mux chips 12. This means that onehundred sixty-two connections are available between the co-simulationlogic chip 204 and the Mux chips 12 (3*54=162) as shown in FIG. 11.Netlist generator 1108 will also mark special nets that connect to theMP cell 1200 with the corresponding pin numbers that will guide systemrouter (1032 to generate correct physical connections for co-simulationlogic chip 204. This is required because connections between processor206 and co-simulation logic chip 204 are attached to specific pins oflogic chip 204.

[0186] Behavioral testbench compiler 1004 has been fully disclosed in aco-pending application:

[0187] Method And Apparatus For Design Verification Using Emulation AndSimulation, Ser. No. 08/733,352 by Sample et al. which is incorporatedherein by reference in its entirety.

[0188] Logic Chip Memory (LCM) generator 1006 implements shallow buthighly ported memories using Xilinx relationally placed macros (rpms).It supports memories with up to fourteen write ports, any number of readports, and one additional read-write port for debug access. It utilizessynchronous dual-port RAM primitives which are available as componentsof the logic chip 10.

[0189]FIG. 22a shows an example of a memory circuit that could begenerated by LCM memory generator 1006 for placement in a logic chip 10.The memory circuit in FIG. 22a comprises the following components:

[0190] A write enable sampler and arbitrator 1050 synchronizes writeenable signals with a fast clock and prioritizes the write operations ofthe memory circuit when there are requests from several ports at once.The write enable sampler and arbitrator 1050 outputs write address/datamux selects and write enable signals. Write enable sampler andarbitrator cells are pre-compiled into a reference library in the formof hard macros with various different write port configurations from twoto sixteen write ports.

[0191] The memory circuit of FIG. 22a also comprises a read counter1052. Read counter 1052 is used to cycle through the read ports of thememory to be implemented. These counters are also pre-compiled into areference library as hard macro cells with various count lengths.

[0192] The memory circuit of FIG. 22a also comprises a multiplexer 1053which places either the output of the read counter 1052 or the writeenable sampler and arbitrator 1050 on its output.

[0193] The output of multiplexer 1053 is the slot select signalSLOT_SEL, which comprises four wires allowing any one of sixteen slots(or ports) to be selected.

[0194] The memory circuit of FIG. 22a also comprises address muxes anddata muxes 1056. Address muxes and data muxes 1056 are used to selectport write/read address data and port write data when the appropriateslot or port time arrives. The slot select signal SLOT_SEL is input tothe select inputs of the address muxes and data muxes 1056 to performthis function.

[0195] The memory circuit of FIG. 22a also comprises memory 1058. Memory1058 is a static RAM memory available as a one or more Xilinxconfigurable logic block (CLB) components.

[0196] The memory circuit of FIG. 22a also comprises read slot decoder1054. Read slot decoder 1054 decodes the slot select signal SLOT_SEL (ofwhich there are four) into up to sixteen individual wires to be used asthe clock enable inputs for the output registers 1060.

[0197] Referring back to FIG. 21, the width, depth and number of portsgenerated by LCM memory generation program 1006 depends on therequirements of the netlists produced by netlist import program 1000,HDL-ICE™ brand synthesizer program 1002 or Behavioral Testbench program1004. The Xilinx relationally placed macros (RPMS) are created as adatabase cells defined using generic cell instances, as well asinstances of special FMAP and HMAP cells to control the mapping of thememory circuits into the particular logic modules of the logic chips 10.FMAP and HMAP cells are special primitive components which control thebehavior of the Xilinx PPR program 1022. As discussed, in the presentlypreferred embodiment, these are the CLBs in the Xilinx FPGAs. Theseinstances can also have an RLOC property that specifies relativelocation of a logic module (a CLB in the presently preferred embodiment)where the logic is to be placed.

[0198] The RPM cells must be flagged (in the presently preferredembodiment, this flag is referred to as “NOFLAT”) to prevent the chippartitioning program 1018 from splitting them between several logicchips. The RPM cells must also have precalculated capacity values and aproperty containing their dimensions (number of logic modules, e.g.,CLBs, used horizontally and vertically).

[0199] Data qualification program 1010 does not verify the netlistinside RPM cells because parallel connection of FMAP and HMAP primitivesto the logic primitives may create an appearance of design ruleviolation. The NGD Out program 1020 will preserve RLOC values in allprimitives in each RPM instance. This will allow PPR 1022 to place RPMsin a chip in such a manner as to satisfy the constraints defined by RLOCproperties.

[0200] User data input program 1008, in addition to allowing the user toenter clock and other design information, also computes the global probemultiplexing factor. Probes are the points inside a netlist which willbe observed during debugging of the design. The probe multiplexingfactor determines the length of scan chains which will be added to thelogic chips 10. The user can either list the probes or request a fullvisibility mode. In the case of full visibility, the multiplexing factoris sixty-four. If the user wants only a specified list of signals to bevisible, then the multiplexing factor should be computed as:

(Number of probes)*(Deviation factor)/(432*(Number of logic boards))

[0201] The number of logic boards 200 must be known when the computationis made. Deviation factor is an experimentally determined factor used toaccount for possible non-uniform distribution of probed signals amonglogic boards 200. Probability theory considerations suggest a valuebetween 1.4 for large systems and 1.7 for two-board systems. For asystem with B boards it is approximately 1/(1−0.29sqrt(B/(B−1))). Thisfactor can be further increased to provide the room for adding probesincrementally without recompilation of more than one logic board 200.Logic analyzer events in the preferred embodiment system are computed bythe programmable logic in the logic chips 10 on logic boards 200.Therefore, capacity should be reserved in logic chips 10 for eventcalculations. Consequently, if the user delays signal and eventdefinition until after the design compilation, the incremental recompileof affected chips will be necessary. In the case when the reservedcapacity is insufficient for a given chip, signals will need to berouted to other logic chips 10 that have sufficient capacity to build anevent detector, as previously shown in FIG. 20c. This can result in alonger compilation time. A long compilation time can be avoided byspecifying all signals before compilation that are used to create anyevent. It is unnecessary to actually define events or triggers at thispoint because this has no effect on capacity. The event logic functionitself can be downloaded into the logic chip during its operation usingthe JTAG bus connected to controller 221 (shown in FIG. 20 and c).

[0202] Finally, during this user data input step 1008, the user needs toselect the time-multiplexing factor for non-critical signals. Asdiscussed above, the time-multiplexing factor can be either one, two, orfour.

[0203] Chip partitioning programs 1016 and 1018 use a clustering basedalgorithm. Examples of similar algorithms can be seen in prior arthardware emulation systems such as the System Realizer™ emulation systemfrom Quickturn Design Systems, Inc. In the presently preferredembodiment, however, there are a number of differences. Thesedifferences will now be explained in detail.

[0204] 1) Certain types of cells need special attention to avoidimproper partitioning, clustering, etc. “No-touch” cells are certaincells which must not be clustered together with any logic. An example ofa “No-touch” cell is the MP cell shown in FIG. 23. “No-flat” cells arecells which must not be split among several chips. Examples of “No-flat”cells are latches and hard macros where splitting would introduce timingproblems.

[0205] 2) Some special nets do not have drivers and can be cutarbitrarily. In addition to POWER and GROUND, an example of such aspecial net to which logic gates can be connected is the Mux Clocksignal (MUXCLK) 44. In particular, the behavioral testbench compiler1004 and the EBM compiler 1030 and LCM compiler 1006 will create logicconnected to MUXCLK.

[0206] 3) Pin out constraints control the maximum number of nets that acluster can have.

[0207] Assuming that a cluster of logic has RI regular external inputnets, RO regular external output nets, CN critical external nets, Pprobed signals, and the time-division multiplexing factor for probes isT, the number of pins required to implement this cluster on a chip iscalculated as follows (all divide operations are pure integer divisionswithout rounding).

[0208] a. Without time-multiplexing of logic signals, the number of pinsis RI+RO+CN+(P+T−1)/T;

[0209] b. With two-to-one time-multiplexing of logic signals, the numberof pins is (RI+1)/2+(RO+1)/2+CN+(P+T−1)/T

[0210] c. With four-to-one time-multiplexing of logic signals, thenumber of pins is max((RI+1)/2, (RO+1)/2)+CN+(P+T−1)IT

[0211] Note: when full visibility mode is selected by the user, thenumber of probes P is assumed equal to the number of flip/flops andlatches.

[0212] 4) The maximum size allowed for a cluster is based upon the gatecapacity of the particular logic chip 10. In addition to logic gates,additional capacity is required for time-division multiplexing, probingand event detection circuitry. Assuming that a cluster of logic has RNregular (non-critical) external nets (RN is equal to RI plus RO), Pprobed signals, and E signals used in event detection then the addedcapacity for time-division multiplexing, probing, and event detectioncircuitry is as follows:

[0213] a. Without time-multiplexing of logic signals the additionalcapacity for logic analyzer is

[0214] flip/flops: P+2*E+logE

[0215] gates: C₁*P+C₂*((E+1)/2)*8

[0216] In the presently preferred embodiment, the constants are C₁=2,C₂=4. They may be adjusted later based on experimental results.

[0217] b. With any type of time-multiplexing (2:1, 4:1, or otherschemes), an additional RN flip/flops is needed in addition to thoserequired for the logic analyzer.

[0218] 5) Partitioning is also controlled by the need to implement theclock tree correctly as explained in U.S. Pat. No. 5,475,830. Each netin the design is assigned a 16-bit integer property which is calledCLKMASK. Bit i of CLKMASK should be set if user clock i reaches this netin a direct (non-inverted) phase. Bit 8+i should be set if user clock ireaches this net in an inverted phase. This information will be passedto the PPR program 1022 to perform the required delay adjustment.

[0219] The NGD Out program 1020 outputs a netlist in a format suitablefor the PPR program 1022 to process. In addition, it performs a numberof special functions relating to logic modification to inserttime-division multiplexing or debugging logic. These functions are:

[0220] Relationally placed (RP) macro preservation: Relationally placedmacros in the database are preserved in the NOD files passed to PPR. RPmacros are groups of logic gates that have been mapped into fixedpatterns of CLBs inside the Xilinx FPGAs. RP macros will not bere-partitioned in later software steps so as to preserve their timingcharacteristics.

[0221] TDM cells insertion: Time-division-multiplexing cells are addedto the boundary of each logic chip 10 where it connects to a Mux chip12. Predefined cells are used which are placed relative to the set ofI/O pins being multiplexed. FIGS. 24a-24 k show all the differentvarieties of TDM cells which may be inserted depending on the type ofthe I/O pins. For time-division multiplexing, the terminals of a logicchip 10 and Mux chip 12 are divided into groups of four using thespecial RPM cells as shown in FIG. 24a-24 k.

[0222] For the remainder of the terminals, groups of two are used, orthe regular non-multiplexed I/O already on the logic chip 10 or Mux chip12 is used. Non-multiplexed I/O is always used for critical nets.

[0223] TDM control logic insertion: TDM control logic generates anddistributes the TDM control signals, which are MC, MS, MT, E0, E1, E2,and E3, into the circuits shown in FIG. 24a-24 k. These signals aregenerated by one of three special control cells which are inserted intoeach logic chip 10 in addition to the logic shown in FIG. 24a-24 k.

[0224] Generation of these signals is done using logic 104 in shown FIG.6 or logic 68 shown in FIG. 3. MC is Mux Clock Signal 44; MS is DividedClock Signal 50; MT is Direction Signal 80; and E0-E3 are the EnableSignals 90, 92, 94 and 96, respectively. The special cells have twoinputs NMUXCLK 44 and SYNC- 48 which are connected to fixed input pinson logic chip 10. .One type of control cell (not shown) is used for thechips that do not use TDM but have logic connected to Mux Clock Signal(MUXCLK) 44. This cell only outputs Mux Clock Signal (MUXCLK) 44. Thesecond type (logic 68 shown in FIG. 3) is used for designs withtwo-to-one TDM. It outputs Mux Clock Signal (MUXCLK) 44 and MS (DividedClock) signals 50. The third type of control cell (logic 104 shown inFIG. 6) is used for four-to-one time-multiplexing. It generates MuxClock Signal (MUXCLK) 44, MS (Divided Clock) 50, MT (Direction) 80, E090, E1 92, E2 94, E3 96.

[0225] Scan cell insertion for probed signals: Each probed signal mustbe connected to the data input of a probe cell. The probe cell has nooutputs and two other inputs. One of these inputs is electricallyconnected to Mux Clock Signal (MUXCLK) 44. The other input iselectrically connected to the Trace Clock Signal 2002 coming from a chipinput. Probe cells comprise a flip-flop 2004 and a multiplexer 2005, asseen in FIGS. 20b and 20 c.

[0226] Generation of scan chain specification file: All instances ofprobe cells must be listed in a scan chain specification file. The scanoutputs 2006 (see FIG. 20b) of the chip must also be listed. Theseoutputs must are inserted into a database model of chip logic clustersso that the system router can see them and build appropriateconnections. The number of outputs is (P+T−1)/T where P is the number ofprobe cells and T is a time-division multiplexing factor for probesignals.

[0227] Insertion of event detection cells for signals contributing toevents: The signals contributing to events are divided in pairs and eachpair is connected to the I0 and I1 inputs of eight copies of an eventdetection cell 1300, as shown in FIG. 25. A preferred embodiment of anevent detection cell has been previously shown in FIG. 20c. The eventdetection cell 1300 comprises four flip-flops 2010 and a CLB memory2018. Four multiplexers 2020 and four output buffers 2022 are used toproduce four multiplexed event signals 236 (also shown in FIGS. 20c and20 a). If the number of signals is odd, one of the inputs to each of theevent detection cells is left unused for the corresponding eight cells.

[0228] Generation of eight balanced AND trees for event detectoroutputs, and the TDM logic to connect the eight AND trees' outputs tofour dedicated event pins: The outputs of event detection cells 1300 arecombined using eight balanced AND trees so that one copy of the eightcells created in the previous step is present in each of the trees. Theoutputs of the trees are time-multiplexed pairwise using specialevent-multiplexing cells as shown in FIG. 26. This circuitry has alsobeen described in reference to FIG. 20c. AND gates 12 are constructedusing wide edge decoders 2012 as shown in FIG. 20c. FIG. 26 shows thiscircuitry in greater detail.

[0229] Generation of event detector download paths and a boundary scancontroller: Event detector download circuit 1500 is shown in FIG. 27. Itis comprised of a counter 16 and shift register 2014, together with JTAGcontroller 1150. JTAG controller 1150 is available as a standard portionof the Xilinx logic chips 10. This circuitry is also shown together withthe scan register and event detector in FIG. 20c. The event detectordownload circuit 1500 produces the WA 1502, WE 1504, DRCLK 1508, and TDI1506 signals for all event detectors (also-shown in FIG. 20c). The eventdetector counter 2016 generates WA signals 1502 and a clock for shiftregister 2014, the length of which depends on the number of eventdecoder circuits. The circuit is shown in FIGS. 20c and 27. In apreferred embodiment, shift register 2014 is generated based on thenumber of event detectors. It is acceptable, however, to define amaximum number of event detectors per chip and fix the design of theshift register 2014. The PPR program 1022 will trim most of the unusedlogic.

[0230] Referring back to FIG. 21, board partitioning step 1024 will nowbe discussed. The function of board partitioning step 1024 is to findchip clusters (a cluster is a collection of interconnected components)with the largest possible number of chips not exceeding the number oflogic chips 10, 204 on a single logic board 200 (thirty-seven chips) ora pair of logic boards (seventy-four chips), with the followinglimitations:

[0231] 1. Total number of ingoing or outgoing nets should not exceed thesum of the I/O connections on two backplane connectors 220 for a pair oflogic boards 200 as shown in FIG. 11 (3608 in the presently preferredembodiment) multiplied by a target backplane utilization coefficient.The target backplane utilization coefficient is determinedexperimentally, and depends on the success the system routing program1032 is able, on average, to achieve. The target backplane utilizationcoefficient is expected to be approximately ninety percent.

[0232] 2. The total number of chip outputs marked as logic analyzerchannels should not exceed 864 (fifty-four Mux chips 12, multiplied byeight SGRAM 210 pins, the total of which is multiplied by two logicboards 200 in a module).

[0233] 3. The full set of EBM memory instances should fit into no morethan twenty-four chips (twelve for a half-size modules) (as describedearlier with reference to FIG. 11, there are twelve RAMs 208 on a logicboard 200 or twenty-four on a pair of logic boards) and the number oflogic chips 10 required for EBM memories counts against the total ofseventy-four (thirty-seven for a half-size module).

[0234] 4. Total number of CPU cell instances (i.e., the number of CPUinstances from the user's design) should not exceed two (one for ahalf-size modules) (as described with reference to FIG. 11, there is oneprocessor 206 per logic board 200 or two on a pair of logic boards).

[0235] 5. Two (one for a half-size module) of the seventy-four(thirty-seven for a half-size module) chips 204 can be used as clockgeneration logic chips or attached to the microprocessor cells. Ifmicroprocessor cells are present, there will be no clock generationlogic chips and vice versa because the CoSim logic chip 204 can only beused for one function at a time. However, it is possible that there areneither. In such case, only seventy-two (thirty-six on a single logicboard 200) full-capacity logic chips 10 can be used. The two additionalCoSim logic chips 204 (one for a half-size module) can then be used toimplement additional user logic if clusters with no more than onehundred sixty-two I/O pins are available (see FIG. 11).

[0236] After the appropriate clusters are identified, the full-sizeclusters are further subdivided into two emulation boards with no morethan 1868 (the number of pins on turbo connector 202) inter-boardconnections. Each board must have no more than half of all criticalcluster resources (1804 ingoing or outgoing nets, twelve EBM memories,one microprocessor or clock generation logic chip 204, four hundredthirty-two logic analyzer channels, thirty-seven logic chips 10, 204).

[0237] EBM compilation step 1030 creates the memory cell instances to beimplemented as emulation block memories (EBM). These are created asspecial cells not to be included in any logic clusters during chippartitioning. An estimation subroutine evaluates how many EBM chips 208(see FIG. 11) a given set of memory instances requires. This subroutinewill be called from hierarchical partition planning program (HPP) 1014(this connection is not shown in FIG. 21) and board partitioning program1028 to properly designate a set of memory instances that can beimplemented on one board, and the number of logic chips 10 that thememory control circuit will consume. After board partitioning process1028 is complete, the EBM memory compiler 1030 will create a logiccluster associated with each RAM chip 208 on logic board 200. All linesleading to RAM chip 208 will be marked as being “critical” so that NGDOut program 1020 will not insert time-multiplexing logic into them. Theyalso have properties containing their respective logic chip 10 pinnumbers so that system router 1032 can generate correct I/O constraints.EBM logic clusters cannot contain probe signals and cannot generateevents because they contain automatically generated logic not accessibleto the user.

[0238] In a preferred embodiment, the EBM logic clusters arepre-compiled. This allows the placement and routing time to be saved forthese clusters. EBM memory compiler 1030 has been for fully described inco-pending application Ser. No. 08/733,352.

[0239] System router 132 assigns physical wires in the logic chips 10,204, Mux chips 12, and logic boards 200 to the logic nets (or signals inan emulated design), pairs of logic nets (in two-to-one multiplexing)and the groups of four nets (in four-to-one multiplexing). Followingthat, it assigns the logic chip 10 pin and a time-division multiplexing(TDM) phase to each signal going in and out of each logic chip 10 and204.

[0240] It is important when doing system routing to select the optimalroute for time-multiplexed signals to minimize the signal delay. Thealgorithm for doing so is as follows:

[0241] 1. Two-to-One Time-Division Multiplexing (2-1 TDM):

[0242] The optimal route switches TDM phases in each Mux chip 12 but noten route from the physical net source to the physical net destination.Examples of optimal routes are:

[0243]alpha/output/even-beta/input/even-beta/output/odd-alpha/input/odd. or

[0244]alpha/output/even-beta/input/even-beta/output/odd-muxbeta/input/odd-muxbeta/output/evenbeta/input/even-beta/output/odd-alpha/input/odd

[0245] Alpha chips are equivalent to logic chips 10 or 204 and betachips are equivalent to Mux chips 12 in this description. This gives aminimal one cycle delay between two logic chips 10 or 204. The delay mayappear to be one-half of a cycle upon examining the logic in FIGS. 3 and4. It is, in fact, one full cycle because a demultiplexer 34 in logicchips 10, 204 clocks signals close to the end of the half cycle so thatthe signal is steady in logic chips 10 or 204 on the next half cycleafter it is received. If router 1032 fails to find an optimal route,meaning that an appropriate phase MUX output is not available, or anappropriate phase logic chip 10 or 204 input is not available, thesignal loses an additional half cycle of delay. The router attempts notto accumulate the misses along the same net, if at all possible.Critical nets are not multiplexed in order to minimize their delay.

[0246] 2. Four-to-One Time-Division Multiplexing (4-1 TDM):

[0247] Each physical net always includes one inout pin (IIOO sequence)and one outin pin (OOII sequence). Again, the optimal route switches onetime-division multiplexing (TDM) phase in Mux chip 12 but not en routefrom a physical net source to a physical net destination Examples ofoptimal routes are:

[0248]alpha/OI/O1-beta/IO/I1-beta/OI/O2-alpha/IO/2alpha/OI/O2-beta/IOI2-beta/IO/O3alpha/OI/I3alpha/OI/O1-beta/IO/I1-beta/OI/O2-muxbeta/IO/I2-muxbeta/IO/O3-beta/OI/I3-beta/IO/O4-alpha/OI/I4

[0249] This gives a minimal one-half cycle delay alpha-to-alpha.However, one-half cycle of four-to-one time-division multiplexing (4-1TDM) has same duration as one cycle of two-to-one time-divisionmultiplexing (2-1 TDM). Therefore, assuming all nets are optimallyrouted, no speed is lost in four-to-one time-division multiplexing (4-1TDM) compared to two-to-one time-division multiplexing (2-1 TDM).However, misses (i.e., failure to find an optimal route; as discussedabove) in four-to-one time-division multiplexing (4-1 TDM) routing havemore severe consequences than in two-to-one time-division divisionmultiplexing (2-1 TDM) routing. For example, the path:

[0250] alpha/OI/O1-beta/IO/I1-beta/OI/O1-alpha/IO/I1

[0251] will delay the signal by 1.25 four-to-one time-divisionmultiplexing (4-1 TDM) cycles (or 2.5 two-to-one time-divisionmultiplexing (2-1 TDM) cycles) which is two and one-half times worsethan an optimal delay. In every hop through a Mux chip 12, router 1032can miss by 0, ¼, ½, or ¾ of a four-to-one time-division multiplexing(TDM) cycle depending on what input-output pair the router selects.Router 1032 makes every attempt to miss as little as possible. Thus,critical nets should not be multiplexed to minimize their delay.

[0252] Some logic chips 10 or 204 have input/output nets locked tospecific pins. Examples are Mux Clock signals (MUXCLK) 44, Trace ClockSignals 2002, connections between co-simulation logic chip 204 and aprocessor 206 (see FIG. 11), connections between memory controller logicchips 10 and RAM chips 208, event signal outputs 236, etc. Theseconnections do not need to be routed but have to be included into logicchip 10, 204 pin constraints data. Additional programming is alsorequired for a clock distribution circuit (Mux chip 12) on controlmodule 600 (shown in FIG. 19). This is a part of a clock circuit used toselect no more than eight user clocks reaching each of the logicmodules.

[0253] NGD update program 1034 supplies final parallel partition, placeand route (PPR) software 1036 with the information about the actual pinI/O assignments produced by system router 1032. For non-time-multiplexeddesigns this is just an assignment of signals to I/O pads.

[0254] For time-multiplexed designs, TDM logic on the periphery of logicchips 10, 204 and Mux chips 12 is also added.

[0255] Final parallel partition, place and route (PPR) program 1036reruns the PPR program in an incremental mode to reroute the I/O pins atthe periphery of the chip. As stated earlier, the PPR program isavailable from Xilinx Corporation. The rerouting changes logic chip 10,204 configuration files previously produced at preliminary PPR step 1022and fixes the Pin out as determined by system routing step 1032.

[0256] Thus, a preferred method and apparatus for emulating, verifyingand analyzing an integrated circuit has been described. Whileembodiments and applications of this invention have been shown anddescribed, as would be apparent to those skilled in the art, many moreembodiments and applications are possible without departing from theinventive concepts disclosed herein. The invention, therefore is not tobe restricted except in the spirit of the appended claims.

We claim:
 1. A logic analyzer integrated in a hardware logic emulationsystem that emulates a logic design, the logic emulation systemcomprising a plurality of logic chips, the plurality of logic chipsbeing interconnected to each other by a plurality of interconnect chips,the logic design comprising combinational logic elements and sequentiallogic elements, the logic analyzer comprising: at least one scan chainprogrammed into each of said plurality of logic chips, said at least onescan chain comprised of a flip-flop, said at least one scan chainprogrammably connectable to outputs of a selected subset of thesequential logic elements of the logic design; at least one memorydevice, said at least one memory device in communication with said atleast one scan chain and storing data from the sequential logic elementsof the logic design; and control circuitry, said control circuitry incommunication with said plurality of logic chips, said control circuitrygenerating logic analyzer clock signals and trigger signals, said logicanalyzer clock signals clocking said at least one scan chain, saidtrigger signals being generated when a predetermined combination ofsignals occur in said plurality of logic chips.
 2. The logic analyzer ofclaim 1 further comprising a means for calculating states of thecombinational logic elements in the logic design from data stored insaid at least one memory device.
 3. The logic analyzer of claim 1wherein each of said plurality of logic chips comprise fieldprogrammable gate arrays.
 4. The logic analyzer of claim 3 wherein saidat least one scan chain is programmed into configurable logic cellswithin said field programmable gate arrays.
 5. The logic analyzer ofclaim 4 wherein said at least one scan chain is programmably connectedto outputs of said selected subset of the sequential logic elements ofthe logic design using configurable routing resources within said fieldprogrammable gate arrays.
 6. The logic analyzer of claim 1 wherein saidat least one memory device communicates with said at least one scanchain through the plurality of programmable interconnect chips.
 7. Thelogic analyzer of claim 1 wherein said logic chips have event logicimplemented therein which computes said predetermined combination ofsignals.
 8. The logic analyzer of claim 3 wherein the plurality of fieldprogrammable gate arrays has event logic programmed therein whichcomputes said predetermined combination of signals.
 9. A logic analyzerintegrated in a hardware logic emulation system that emulates a logicdesign, the logic emulation system comprising a plurality of logicchips, the plurality of logic chips being programmably interconnected toeach other, the logic design comprising combinational logic elements andsequential logic elements, the logic analyzer comprising: at least onescan chain programmed into each of said plurality of logic chips, saidat least one scan chain comprised of a flip-flop, said at least one scanchain programmably connectable to outputs of a selected subset of thesequential logic elements of the logic design; at least one memorydevice, said at least one memory device in communication with said atleast one scan chain and storing data from the sequential logic elementsof the logic design; and control circuitry, said control circuitry incommunication with said plurality of logic chips, said control circuitrygenerating logic analyzer clock signals and trigger signals, said logicanalyzer clock signals clocking said at least one scan chain, saidtrigger signals being generated when a predetermined combination ofsignals occur in said plurality of logic chips.
 10. The logic analyzerof claim 9 further comprising a means for calculating states of thecombinational logic elements in the logic design from data stored insaid at least one memory device.
 11. The logic analyzer of claim 9wherein each of said plurality of logic chips comprise fieldprogrammable gate arrays.
 12. The logic analyzer of claim 11 whereinsaid at least one scan chain is programmed into configurable logic cellswithin said field programmable gate arrays.
 13. The logic analyzer ofclaim 12 wherein said at least one scan chain is programmably connectedto outputs of said selected subset of the sequential logic elements ofthe logic design using configurable routing resources within said fieldprogrammable gate arrays.
 14. The logic analyzer of claim 9 wherein saidat least one memory device communicates with said at least one scanchain through the plurality of programmable interconnect chips.
 15. Thelogic analyzer of claim 9 wherein said logic chips have event logicimplemented therein which computes said predetermined combination ofsignals.
 16. The logic analyzer of claim 11 wherein the plurality offield programmable gate arrays has event logic programmed therein whichcomputes said predetermined combination of signals.